)]}'
{
  "log": [
    {
      "commit": "14aaf84f9f7b6c303b08b13b3d544c36a33873b2",
      "tree": "5ff28ac7610ddfda0f49b50da6a6dbbcf79db813",
      "parents": [
        "b80247f1200087aa4985ad7c67ae2cbed82b33ce"
      ],
      "author": {
        "name": "Gabriel",
        "email": "liwenqiang@selectdb.com",
        "time": "Thu Jul 23 23:09:28 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 23:09:28 2026 +0800"
      },
      "message": "[test](regression) Complete Iceberg/Paimon schema time travel P0 coverage (#65960)\n\n## What\n\n- add P0 Iceberg/Paimon schema-evolution matrices combined with\nsnapshot, tag, branch, time travel, delete/upsert and reader/cache\nvariants\n- split independent dimensions into separate Groovy suites so regression\ncan execute them concurrently\n- add stable negative regressions for unsupported format operations and\nconfirmed product issues\n- extend JDBC catalog cases with rename plus old snapshot/tag reads\n- make JDBC setup topology-neutral by distributing drivers to every\nFE/BE node\n- add a checked-in coverage document that maps schema operations,\nhistorical references, delete modes and suite ownership\n\n## Scope\n\nTests and test documentation only. No production code is changed.\n\n## Validation\n\n- full FE/BE build passed\n- regression framework compile passed\n- core matrix: 10 suites, 0 failed, 0 fatal, 0 skipped\n- JDBC catalog matrix: 2 suites, 0 failed, 0 fatal, 0 skipped\n- source formatting checks passed\n- changed Groovy files contain scenario comments and no Jira references\n\nThe branch was validated successfully and was not rebased after\nvalidation, as requested."
    },
    {
      "commit": "b80247f1200087aa4985ad7c67ae2cbed82b33ce",
      "tree": "554f55ed886be2bd884d66fd8f968efa6ada7dcd",
      "parents": [
        "805b746addd7496e2df520d85b8fc6fe006f8636"
      ],
      "author": {
        "name": "Gabriel",
        "email": "liwenqiang@selectdb.com",
        "time": "Thu Jul 23 21:47:46 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 21:47:46 2026 +0800"
      },
      "message": "[fix](file) Resolve remaining FileScannerV2 audit issues (#65931)\n\n### What problem does this PR solve?\n\nThis follow-up audits the 10 unresolved review threads on\nhttps://github.com/apache/doris/pull/65674 and every child of\nDORIS-27038.\n\n- Three unresolved threads were already fixed on master.\n- This PR fixes the remaining seven review findings.\n\n### What is changed?\n\n- Harden Parquet delta geometry, page allocation validation, schema\nambiguity handling, dictionary reuse, decompression scratch lifetime,\nand one-child MAP_KEY_VALUE SET parsing.\n- Validate delete expression result ownership before erasing temporary\ncolumns.\n- Scope Iceberg row-lineage virtual columns to Iceberg readers.\n- Propagate Remote Doris Flight timeout and cancellation.\n- Reject NULL JDBC special-type casts for non-nullable targets.\n- Aggregate hybrid Paimon/Hudi condition-cache hit counters.\n- Remove redundant JSON-line copies and Hive key allocations.\n\nThe scanner V1 path under `be/src/format` is intentionally untouched.\n\n### Verification\n\n- BE ASAN unit tests: 172 tests from 12 related suites passed.\n- clang-format 16 dry-run passed for every changed C++ file.\n- `git diff --check` passed.\n- Confirmed no diff under `be/src/format`.\n\n---------\n\nSigned-off-by: Gabriel \u003cliwenqiang@selectdb.com\u003e"
    },
    {
      "commit": "805b746addd7496e2df520d85b8fc6fe006f8636",
      "tree": "a412328bdf3c65c208e939d06cd429e0ec867ef5",
      "parents": [
        "8a76160b6460fbf8c406c3e6b770142e43719200"
      ],
      "author": {
        "name": "Dongyang Li",
        "email": "lidongyang@selectdb.com",
        "time": "Thu Jul 23 20:11:47 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 20:11:47 2026 +0800"
      },
      "message": "[fix](regression) use JDK 17 for Java UDF builds (#65888)\n\n## What changed\n\n- make `run-regression-test.sh` select and validate JDK 17 before\ninvoking Maven\n- prefer a valid `JAVA_HOME` or `JDK_17`, with Linux and macOS discovery\nfallbacks\n- build the regression framework and Java UDF case jar with the same JDK\n17\n- compile `java-udf-src` with `maven.compiler.release\u003d17`\n\n## Why\n\nThe regression build depended on the caller to prepare `JAVA_HOME`, and\nnewer branches also switched to JDK 8 only for the Java UDF module. This\nsplit ownership between CI and the Doris build script and could produce\nincomplete case artifacts when the framework required JDK 17.\n\nKeeping Java selection in `run-regression-test.sh` makes local and CI\nbuilds follow the same contract and removes the Java 8-only UDF build\npath.\n\n## Validation\n\n- `bash -n run-regression-test.sh`\n- `git diff --check`\n- ran `./run-regression-test.sh --clean` while the incoming `JAVA_HOME`\npointed to JDK 22; the script selected JDK 17 and Maven reported Java 17\n- built `regression-test/java-udf-src` with JDK 17 successfully\n- verified `Echo$EchoInt` has class major version 61 and the expected\nUDF classes are present in the assembled jar"
    },
    {
      "commit": "8a76160b6460fbf8c406c3e6b770142e43719200",
      "tree": "a3acf889e05e2f98da963bfca2cd7a3e0021e608",
      "parents": [
        "7038bb70bf25d2b5d23600fc0b0dd7f02f63abd4"
      ],
      "author": {
        "name": "wudi",
        "email": "wudi@selectdb.com",
        "time": "Thu Jul 23 18:50:43 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 18:50:43 2026 +0800"
      },
      "message": "[feature](streaming-job) Support OceanBase CDC streaming jobs (#65588)\n\n### What problem does this PR solve?\n\nRelated PR: #65325\n\nStreaming jobs currently support MySQL and PostgreSQL CDC sources but\ncannot use OceanBase as a source.\n\nThis change adds OceanBase MySQL compatibility mode as a streaming job\ndata source. It introduces OceanBase source type handling in FE and the\nCDC client, reuses the MySQL-compatible CDC processing path, validates\nsource configuration and compatibility mode before creating target\ntables, and supports initial, snapshot, latest, earliest, and specific\nstartup offsets.\n\nThe OceanBase third-party environment exposes separate JDBC and CDC\nports. Test coverage includes basic synchronization, data types, startup\noffsets, schema changes, and FE restart recovery."
    },
    {
      "commit": "7038bb70bf25d2b5d23600fc0b0dd7f02f63abd4",
      "tree": "b434acb3eb5a3b6c9fb722676a63ade27cc46b8e",
      "parents": [
        "e0d8db50e1af651a2d7ec5fa5f2a40f3a2f8589f"
      ],
      "author": {
        "name": "Mryange",
        "email": "yanxuecheng@selectdb.com",
        "time": "Thu Jul 23 17:37:59 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 17:37:59 2026 +0800"
      },
      "message": "[opt](expr) Vectorize AVG serialized state merge (#65729)\n\n### What problem does this PR solve?\n\n\nClang preserves the original floating-point accumulation order when\nmerging serialized AVG states,\npreventing vectorization of this contiguous hot loop. This change adds a\nreusable reassociation hint\nand applies it to AVG range merge. Existing raw Clang pragmas in AVG,\nSUM, and AVG_WEIGHTED are also\nreplaced with the shared macro.\n\nThe focused benchmark reduced CPU time by 35.6% for 4K states and 38.5%\nfor 64K states."
    },
    {
      "commit": "e0d8db50e1af651a2d7ec5fa5f2a40f3a2f8589f",
      "tree": "31269b7466934bed7703c13349be2515f392bae1",
      "parents": [
        "27cb451229c7a42dd089c5248847828eb0536d02"
      ],
      "author": {
        "name": "zhengyu",
        "email": "zhangzhengyu@selectdb.com",
        "time": "Thu Jul 23 17:31:21 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 17:31:21 2026 +0800"
      },
      "message": "[fix](filecache) fix flaky be UT for LRU dump (#65427)\n\n### What problem does this PR solve?\n\nIssue Number: None\n\nRelated PR: None\n\nProblem Summary: `BlockFileCacheTest.version3_add_remove_restart` relied\non sleeping for the background LRU dump thread after replaying remove\nrecords. If the background dump did not run before the restart phase,\nstale LRU tail records could be restored and the restarted cache could\nreport non-empty queues. The test now explicitly dumps LRU queues after\nreplaying the remove logs, so persisted LRU state is synchronized before\nthe restart check without changing file cache runtime code.\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test:\n- Unit Test: `DORIS_TOOLCHAIN\u003dclang DISABLE_BE_JAVA_EXTENSIONS\u003dON\nENABLE_INJECTION_POINT\u003dON ENABLE_CACHE_LOCK_DEBUG\u003d0 ENABLE_PCH\u003d0 sh\nrun-be-ut.sh --run\n--filter\u003dBlockFileCacheTest.version3_add_remove_restart`\n- Style check: `git diff --check --\nbe/test/io/cache/block_file_cache_test_meta_store.cpp`\n- Behavior changed: No\n- Does this need documentation: No"
    },
    {
      "commit": "27cb451229c7a42dd089c5248847828eb0536d02",
      "tree": "e462a05b35464228cac32602ff54138811611699",
      "parents": [
        "bd16e5d68fd280343ccf6da61b2d47a9d3c2615d"
      ],
      "author": {
        "name": "Gabriel",
        "email": "liwenqiang@selectdb.com",
        "time": "Thu Jul 23 16:52:33 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 16:52:33 2026 +0800"
      },
      "message": "[improvement](file) Execute split residual predicates in TableReader (#65925)\n\n## What changed\n\n- Add a split-local localization result so `TableReader` knows which\ntable predicates are enforced exactly by the current `FileReader`.\n- Evaluate only unlocalized predicates at the end of\n`TableReader::finalize_chunk`, after schema evolution, defaults,\npartition values, and virtual columns are materialized.\n- Make `FileScannerV2` skip row-level conjunct evaluation; it keeps\nscheduling and accounting responsibilities only.\n- Preserve predicate order by keeping unsafe/stateful predicates and\nevery later predicate on the table-materialization path.\n- Apply the same table-level residual filtering to the Iceberg\nposition-delete system-table reader.\n\n## Why\n\n`FileScannerV2` previously reevaluated every original conjunct after\n`FileReader` had already enforced localized predicates. This duplicated\nexpression work and made predicate execution ownership unclear.\nLocalization also depends on each split\u0027s physical schema, so ownership\ncannot be cached at scanner or table-reader lifetime scope.\n\nThe new contract recomputes ownership for every split: localized\npredicates belong to `FileReader`, while all remaining predicates belong\nto `TableReader` after final materialization.\n\n## Validation\n\n- ASAN BE UT build: `cmake --build be/ut_build_ASAN --target\ndoris_be_test -j128`\n- 287 focused TableReader/ColumnMapper/Iceberg/FileScannerV2 tests\npassed.\n- 41 JNI/Hudi/Paimon reader tests passed.\n- `build-support/check-format.sh` passed with clang-format 16.0.6.\n- clang-tidy was attempted but could not analyze the tree because of\nexisting toolchain/baseline errors (`stddef.h` not found and an\nunmatched `NOLINTEND` in `core/types.h`)."
    },
    {
      "commit": "bd16e5d68fd280343ccf6da61b2d47a9d3c2615d",
      "tree": "b78014f01ce79733407c023b17985b971af3c657",
      "parents": [
        "8a1bf788e290dc5832c4bd432a34d0e8b4c42906"
      ],
      "author": {
        "name": "zclllyybb",
        "email": "zhaochangle@selectdb.com",
        "time": "Thu Jul 23 16:45:17 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 16:45:17 2026 +0800"
      },
      "message": "[fix](fe) Guard auto partition result against concurrent drops (#65282)\n\nProblem Summary: Auto partition creation can race with dynamic partition\nretention. After new partitions are added, the response builder reads\nthe table partition map and partition info without holding the table\nmetadata lock. If a retention drop removes one of those partitions in\nthe same window, the response builder can observe a missing partition or\npartition item and hit a null pointer while constructing partition and\ntablet metadata. This change snapshots the required partition metadata\nunder the table read lock, returns a normal error when the partition was\nalready dropped before the snapshot, and keeps tablet\nlocation/cache/backend resolution work outside the table lock."
    },
    {
      "commit": "8a1bf788e290dc5832c4bd432a34d0e8b4c42906",
      "tree": "bb5b0669bc207b381a89c33ada9f267713999a6e",
      "parents": [
        "67489f6e0062a9c3a4f18529633c7dfc33ad0999"
      ],
      "author": {
        "name": "deardeng",
        "email": "dengxin@selectdb.com",
        "time": "Thu Jul 23 15:39:52 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 15:39:52 2026 +0800"
      },
      "message": "[fix](replica) Resolve conflicting default replica properties (#65836)\n\nProblem Summary: ALTER TABLE updates to default.replication_allocation\ncould leave the legacy default.replication_num property in table\nmetadata. Because replica analysis checks replication_num first, SHOW\nCREATE TABLE and later consumers could observe the wrong default\nallocation. Remove the mutually exclusive property when applying an\nupdate, prefer replication_allocation when cleaning historical metadata,\nand cover allocation updates, metadata cleanup, and reverse numeric\nupdates."
    },
    {
      "commit": "67489f6e0062a9c3a4f18529633c7dfc33ad0999",
      "tree": "838620a7efe4671db92a871e76ec72c1f0f98518",
      "parents": [
        "7c922d5d2ea7fb1b8929668cb2757c9c4ea75824"
      ],
      "author": {
        "name": "Arpit Jain",
        "email": "3242828+arpitjain099@users.noreply.github.com",
        "time": "Thu Jul 23 16:33:32 2026 +0900"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 15:33:32 2026 +0800"
      },
      "message": "[fix](build) Fix insecure Python dependency pins in doris-compose, dbt-doris, and cost_model (#64928)\n\n### What problem does this PR solve?\n\nIssue Number: N/A\n\nRelated PR: #63188\n\nProblem Summary:\nThree Python requirements files have dependency pins that allow\ninstalling versions with known security issues.\n\n**docker/runtime/doris-compose/requirements.txt**\n\nThe `requests\u003c\u003d2.31.0` constraint sets an upper bound but no lower\nbound, so pip can resolve to any version below 2.31.0. Versions before\n2.31.0 are affected by CVE-2023-32681 (Proxy-Authorization header leaked\non cross-domain redirects). Changed to `requests\u003e\u003d2.31.0` to set a\nsecure floor.\n\n**extension/dbt-doris/dev-requirements.txt**\n\nThe `mysql-connector-python\u003e\u003d8.0.0,\u003c8.3` lower bound allows versions\nwith known authentication bypass issues. Raised the lower bound to\n`\u003e\u003d8.0.33` which includes the relevant fixes.\n\n**tools/cost_model_evaluate/requirements.txt**\n\nThe `mysql_connector_repackaged\u003d\u003d0.3.1` package is an unofficial\nthird-party repackage of the MySQL connector whose last PyPI release was\nin 2014. Replaced with the official `mysql-connector-python\u003e\u003d8.0.33,\u003c9`\npackage. The code in `sql_executor.py` imports `mysql.connector` which\nboth packages provide, so this is a drop-in replacement."
    },
    {
      "commit": "7c922d5d2ea7fb1b8929668cb2757c9c4ea75824",
      "tree": "6e4da61ecf8e306be722528f114abb393c4dbdf6",
      "parents": [
        "c345e92d93d086da11326d449bc1935829a8d072"
      ],
      "author": {
        "name": "zhengyu",
        "email": "zhangzhengyu@selectdb.com",
        "time": "Thu Jul 23 15:31:50 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 15:31:50 2026 +0800"
      },
      "message": "[fix](file-cache): speed up LRU restore startup path (#65174)\n\n## Proposed changes\n\n- Avoid O(N^2) protobuf repeated-field erase during LRU dump restore by\nreading groups and entries with parser cursors.\n- Add two-phase multi-cache creation so different file cache paths\ninitialize and restore in parallel, then publish caches in the original\nconfig order to keep hash routing stable.\n- Keep duplicate cache path handling and ignore_broken_disk behavior\nconsistent with the previous startup path.\n\n## Testing\n\n- `git diff --check`\n- `DORIS_TOOLCHAIN\u003dclang DISABLE_BE_JAVA_EXTENSIONS\u003dON\nENABLE_INJECTION_POINT\u003dON ENABLE_CACHE_LOCK_DEBUG\u003d0 ENABLE_PCH\u003d0 sh\nrun-be-ut.sh --run\n--filter\u003dCacheLRUDumperTest.*:BlockFileCacheTest.create_file_caches_preserves_config_order`\n- `DORIS_TOOLCHAIN\u003dclang DISABLE_BE_JAVA_EXTENSIONS\u003dON\nENABLE_INJECTION_POINT\u003dON ENABLE_CACHE_LOCK_DEBUG\u003d0 ENABLE_PCH\u003d0 sh\nrun-be-ut.sh --run\n--filter\u003dBlockFileCacheTest.test_lru_log_record_replay_dump_restore:BlockFileCacheTest.test_lru_duplicate_queue_entry_restore:BlockFileCacheTest.lru_restore_size_mismatch_does_not_underflow_on_clear`"
    },
    {
      "commit": "c345e92d93d086da11326d449bc1935829a8d072",
      "tree": "62b2b3ea9fc36d4506225015771570289e78c35b",
      "parents": [
        "07bd92a3b7ef0de1bfaaf626a89b46648951599c"
      ],
      "author": {
        "name": "meiyi",
        "email": "meiyi@selectdb.com",
        "time": "Thu Jul 23 15:17:01 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 15:17:01 2026 +0800"
      },
      "message": "[feature](fe) Add meta service RPC rate limiting in FE (#65694)\n\nAdd configurable FE-side rate limiting for meta service RPCs. The change\nintroduces per-method Resilience4j rate limiters, dynamic config\nvalidation, weighted get-version limiting for batch requests,\nrate-limit-specific metrics, and profile/audit visibility for\nget-version limiter wait time."
    },
    {
      "commit": "07bd92a3b7ef0de1bfaaf626a89b46648951599c",
      "tree": "9de8873496ce63e3beb55cfa3ae37ae834bc2759",
      "parents": [
        "ec5f97257ed93f6cb601c2a3ddcc6eb672171cf4"
      ],
      "author": {
        "name": "yujun",
        "email": "yujun@selectdb.com",
        "time": "Thu Jul 23 15:12:57 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 15:12:57 2026 +0800"
      },
      "message": "[fix](stream) Fix stream scan partition prune state propagation (#65657)\n\n### What problem does this PR solve?\n\nRelated Issue: close #65654\n\n`LogicalOlapTableStreamScan.withSelectedPartitionIds(List\u003cLong\u003e,\nboolean)` diverged from `LogicalOlapScan` and changed the meaning of the\nsecond parameter. In `LogicalOlapScan`, the second parameter represents\n`hasPartitionPredicate`, while the rebuilt scan is always marked as\npartition-pruned. The stream scan override treated that second parameter\nas `isPartitionPruned`, which let partition pruning rebuild a stream\nscan that still looked unpruned and could be matched again by the same\nrewrite rule.\n\nThis PR keeps the stream-scan override consistent with\n`LogicalOlapScan`: the second parameter remains `hasPartitionPredicate`,\nand the rebuilt `LogicalOlapTableStreamScan` is marked as already\npartition-pruned."
    },
    {
      "commit": "ec5f97257ed93f6cb601c2a3ddcc6eb672171cf4",
      "tree": "83eb19a2ea09edd5800f6d01209cf227b1990393",
      "parents": [
        "60ba6aef642fa9aaf22d88d7977de9055c1855b3"
      ],
      "author": {
        "name": "lihangyu",
        "email": "lihangyu@selectdb.com",
        "time": "Thu Jul 23 15:05:58 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 15:05:58 2026 +0800"
      },
      "message": "[fix](variant) Prefer sparse variant fields over root value (#65660)\n\nA distributed ordered query using TopN lazy\nmaterialization could return an array root such as [] instead of the\nobject fields stored in a VARIANT row. Root visibility already\nconsidered dense subcolumns and document snapshots but ignored sparse\nsubcolumns, so a non-null root incorrectly won during serialization.\nCheck the current row sparse offsets before treating the root as\nvisible. Before the fix the reproduced row returned []; after the fix it\nreturns {\"n\":7,\"word\":\"hot\"}."
    },
    {
      "commit": "60ba6aef642fa9aaf22d88d7977de9055c1855b3",
      "tree": "17822639b013ea20df91c359513caf072602fbbb",
      "parents": [
        "187154675fec90fdfefdd93521d58a7722105fe1"
      ],
      "author": {
        "name": "TsukiokaKogane",
        "email": "cby141994@gmail.com",
        "time": "Thu Jul 23 13:48:04 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 13:48:04 2026 +0800"
      },
      "message": "[fix](table stream) min delta delete op should return first op\u0027s before value (#65826)"
    },
    {
      "commit": "187154675fec90fdfefdd93521d58a7722105fe1",
      "tree": "eedccf6fd48a57a83bdd5cd6f108edc326af358f",
      "parents": [
        "58e87343513f708f985fdee49da81fb5e344b418"
      ],
      "author": {
        "name": "Gabriel",
        "email": "liwenqiang@selectdb.com",
        "time": "Thu Jul 23 13:26:03 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 13:26:03 2026 +0800"
      },
      "message": "[improvement](be) Optimize projected fixed-width Parquet predicate filtering (#65934)\n\n### What problem does this PR solve?\n\nIssue Number: N/A\n\nRelated PR: #65921\n\nProblem Summary:\n\nWhen a fixed-width Parquet predicate column was also projected, Doris\nmaterialized all selected predicate values and compacted the complete\ncolumn after filtering. The native decoder had already produced the\nvalues needed by the raw predicate, so this added an avoidable\nmaterialize-and-compact pass. The overhead is visible in TPC-DS Q88.\n\n### What is changed?\n\nEvaluate eligible predicates on decoded fixed-width values and append\nonly matching values to the projected column in the same decoder pass.\n\n- Generalize the PLAIN-specific consumer and reader APIs into a\nfixed-width raw filter-and-project path.\n- Support PLAIN and BYTE_STREAM_SPLIT for identity-width INT32, INT64,\nFLOAT, and DOUBLE values.\n- Support DELTA_BINARY_PACKED for INT32 and INT64 values.\n- Prevalidate advertised chunk encodings before consuming definition\nlevels.\n- Reject an unexpected unsupported late-page encoding instead of\nattempting a fallback after cursor progress.\n- Keep the existing dictionary-id filter and dictionary materialization\npath unchanged.\n\nThe raw decoder cannot rewind after predicate evaluation. Therefore,\nwhen the predicate column is projected, survivors must be appended\nbefore the encoded values and definition-level cursor are consumed.\n\nPredicate-only columns retain their placeholder behavior. Nested\ncolumns, converted logical types, residual/delete predicates,\nunsupported expressions, and unsupported encodings continue to use the\nexisting materializing fallback.\n\n### Microbenchmark\n\nThe reader microbenchmark from #65921 was run with a Release build, warm\nfixture cache, CPU 8, a one-second minimum time, and 10 repetitions.\n\nThe matrix adds BYTE_STREAM_SPLIT and DELTA_BINARY_PACKED coverage with:\n\n- 10% alternating NULLs;\n- predicate-only and predicate-projected modes;\n- 1%, 10%, 50%, and 90% selectivity.\n\nAll 16 combinations completed with the expected raw and selected row\ncounts.\n\nProjected, 10% selectivity:\n\n| Encoding | CPU ns/raw row, master | CPU ns/raw row, PR | Improvement |\nCPU CV master / PR |\n|---|---:|---:|---:|---:|\n| BYTE_STREAM_SPLIT | 36.41 | 33.43 | 8.2% | 1.25% / 1.47% |\n| DELTA_BINARY_PACKED | 38.75 | 36.84 | 4.9% | 5.40% / 2.94% |\n\nThe Delta baseline ran under sustained host load, so its result should\nbe treated as directional.\n\nThe original projected PLAIN results remain:\n\n| Selectivity | CPU ns/raw row, master | CPU ns/raw row, PR |\nImprovement |\n|---:|---:|---:|---:|\n| 1% | 32.80 | 30.80 | 6.1% |\n| 10% | 33.95 | 31.82 | 6.3% |\n| 50% | 37.76 | 35.05 | 7.2% |\n| 90% | 42.06 | 37.66 | 10.5% |\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test: Unit Test and Manual test\n- 17 targeted ASAN unit tests covering raw expression evaluation,\nnullable mapping, projected PLAIN/BSS/Delta scans, unsupported-type\nfallback, mixed-page safety, residual predicates, and benchmark matrix\nconstraints\n    - Release benchmark target linked successfully\n    - 16-case BSS/Delta reader microbenchmark matrix\n    - Before/after projected-filter microbenchmarks\n    - clang-format 16, check-format, and `git diff --check`\n- clang-tidy was attempted but blocked by existing master/toolchain\nerrors: unmatched `NOLINTEND` in `be/src/core/types.h` and missing\n`stddef.h` from the configured toolchain\n- Behavior changed: No. This extends the existing raw predicate\noptimization to equivalent fixed-width decoder output.\n- Does this need documentation: No"
    },
    {
      "commit": "58e87343513f708f985fdee49da81fb5e344b418",
      "tree": "a889df61c32c2f731cd084a79a1d007a3f511f52",
      "parents": [
        "576b0acac0459d3ef72d439523f2b8d7e3118d9a"
      ],
      "author": {
        "name": "bobhan1",
        "email": "baohan@selectdb.com",
        "time": "Thu Jul 23 12:53:38 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 12:53:38 2026 +0800"
      },
      "message": "[fix](be) remove changing segment cache blocks to index type (#65905)\n\n### What problem does this PR solve?\n\nIssue Number: None\n\nRelated PR: None\n\nProblem Summary:\n\nAfter a cloud segment is finalized, `SegmentWriter` allocates a cache\nholder over the segment index range and changes every intersecting cache\nblock to `INDEX`. This extra holder is not aligned to S3 multipart\nbuffer boundaries. In the deterministic reproduction, it creates cache\nblock `[241, 304]`, which crosses the boundary between buffers starting\nat offsets 0 and 256.\n\nNon-blocking close allows those multipart buffers to finish out of\norder. When the buffer at offset 256 finishes first, it claims `[241,\n304]` and writes its bytes at the beginning of that block. The cached\ndata is therefore shifted by 15 bytes, while the multipart object in\nremote storage remains correct. Both a direct cache read and a cached S3\nread observe the corrupted bytes.\n\nRemove the post-finalize cache-holder allocation and\n`change_cache_type(INDEX)` behavior. Segment blocks keep the cache type\nselected by the file writer, and this cross-buffer cache block is no\nlonger created by `SegmentWriter`.\n\nThe two commits intentionally preserve the proof in history:\n\n1. `028891cdc38` adds a deterministic end-to-end BEUT and fails on the\nexisting behavior.\n2. `0d6964d2f80` removes the behavior and removes the temporary BEUT and\nsynchronization hooks.\n\nThe final PR diff contains only the production-code deletion.\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test: Unit Test\n- `./run-be-ut.sh --run\n--filter\u003dSegmentWriterFileCacheConcurrencyTest.ConcurrentLaterPartMustNotShiftCachedSegmentBytes\n-j100` on the first commit: expected failure, reproducing shifted cached\nbytes in `[241, 304]`\n- `./run-be-ut.sh --run --filter\u003dCloudFileCacheWriteIndexOnlyTest.*\n-j100` on the final commit: 3 tests passed\n- Behavior changed: Yes. Segment cache blocks are no longer changed to\n`INDEX` after segment finalization.\n- Does this need documentation: No"
    },
    {
      "commit": "576b0acac0459d3ef72d439523f2b8d7e3118d9a",
      "tree": "46b09681f304806f526b683c3d7d21019f209945",
      "parents": [
        "70a82532325bb6820f4b60f1e79a2b373cfd01be"
      ],
      "author": {
        "name": "924060929",
        "email": "lanhuajian@selectdb.com",
        "time": "Thu Jul 23 12:18:41 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 12:18:41 2026 +0800"
      },
      "message": "[fix](test) skip test_flight_record when the frontend is not on the regression runner (#65933)\n\nProblem Summary:\n\n`flightRecord` starts the java flight recorder by running `jps` and\n`jcmd` **on the machine that\nexecutes the regression suite**, so `demo_p0/test_flight_record` only\nworks when the frontend and\nthe regression runner are deployed together, e.g. a local development\ncluster. The case never\nverified that precondition.\n\nOn a runner without a local frontend the case failed with\n\n```\njava.lang.IllegalStateException: Can not found process: DorisFE\n    at .../demo_p0/test_flight_record.groovy:30\n```\n\nand was reported as a product failure, although nothing was wrong with\nthe product.\n\nThis PR checks the precondition in the case and returns early when the\nfrontend is not running on\nthis machine, the same way the case already skips on jdk below 17:\n\n```groovy\nString feProcessName \u003d \"DorisFE\"\nboolean feOnThisMachine \u003d false\ntry {\n    feOnThisMachine \u003d \"jps\".execute().text.readLines().any { it.contains(feProcessName) }\n} catch (Throwable t) {\n    logger.info(\"Can not execute jps: ${t.getMessage()}\")\n}\nif (!feOnThisMachine) {\n    logger.info(\"Process ${feProcessName} is not running on this machine, ... skip test\")\n    return\n}\n```\n\n`FlightRecordAction` is deliberately left untouched: once the action is\nreally invoked, failing\nloudly is the right behavior, the precondition belongs to the case. The\nrest of the case is kept\nas is, it still demonstrates the `flightRecord` api.\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test\n    - [x] Manual test (add detailed scripts or steps below)\n\nRan the real regression harness against a local cluster (jdk17), with\n`sh run-regression-test.sh --run -d demo_p0 -s test_flight_record -g\nnonConcurrent`.\nThe \"frontend not here\" case is simulated by pointing the probed process\nname at a name that\ndoes not exist, which is what the action sees when the frontend is on\nanother machine.\n\n    | | frontend process | result |\n    |---|---|---|\n| before this PR | not found | `java.lang.IllegalStateException: Can not\nfound process: ...`, `failed 1 suites` |\n| after this PR | not found | `Process ... is not running on this\nmachine, ... skip test`, `failed 0 suites` |\n| after this PR | found | `JFR.start` / 11x `select 100` / `JFR.stop` /\nparse ok, `allocation bytes: 2348624`, `.jfr` cleaned up, `failed 0\nsuites` |"
    },
    {
      "commit": "70a82532325bb6820f4b60f1e79a2b373cfd01be",
      "tree": "a1ec2808d9471da769b93533666d1efe5318723e",
      "parents": [
        "d921bebcd90a3f55a2446a706d350f2624445c62"
      ],
      "author": {
        "name": "daidai",
        "email": "changyuwei@selectdb.com",
        "time": "Thu Jul 23 12:02:19 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 12:02:19 2026 +0800"
      },
      "message": "[feature](iceberg) Support nested column schema change (#65329)\n\n### What problem does this PR solve?\n\nProblem Summary:\n\nSupport nested Iceberg column paths in external schema-change\noperations, including parser, analyzer, catalog, and Iceberg metadata\nupdates.\n\n| Feature | Spark-Iceberg Syntax | Doris Syntax |\n|---------|-----------------------|--------------|\n| Add a nested field to struct | `ALTER TABLE t ADD COLUMN s.b INT` |\n`ALTER TABLE t ADD COLUMN s.b INT` |\n| Add a nested field to array element struct | `ALTER TABLE t ADD COLUMN\narr.element.b INT` | `ALTER TABLE t ADD COLUMN arr.element.b INT` |\n| Add a nested field to map value struct | `ALTER TABLE t ADD COLUMN\nm.value.b INT` | `ALTER TABLE t ADD COLUMN m.value.b INT` |\n| Add a nested field with position | `ALTER TABLE t ADD COLUMN s.b INT\nFIRST/AFTER a` | `ALTER TABLE t ADD COLUMN s.b INT FIRST/AFTER a` |\n| Drop a nested field | `ALTER TABLE t DROP COLUMN s.b` | `ALTER TABLE t\nDROP COLUMN s.b` |\n| Rename a nested field | `ALTER TABLE t RENAME COLUMN s.b TO c` |\n`ALTER TABLE t RENAME COLUMN s.b TO c` (the legacy form without `TO`\nremains accepted) |\n| Update nested field comment | `ALTER TABLE t ALTER COLUMN s.b COMMENT\n\u0027comment\u0027` | `ALTER TABLE t MODIFY COLUMN s.b COMMENT \u0027comment\u0027` |\n| Modify nested primitive type | `ALTER TABLE t ALTER COLUMN s.b TYPE\nBIGINT` | `ALTER TABLE t MODIFY COLUMN s.b BIGINT` |\n| Modify array element or map value type | `ALTER TABLE t ALTER COLUMN\narr.element TYPE BIGINT`\u003cbr\u003e`ALTER TABLE t ALTER COLUMN m.value TYPE\nBIGINT` | `ALTER TABLE t MODIFY COLUMN arr.element BIGINT`\u003cbr\u003e`ALTER\nTABLE t MODIFY COLUMN m.value BIGINT` |\n| Reorder an existing nested field | `ALTER TABLE t ALTER COLUMN s.b\nFIRST/AFTER a` | `ALTER TABLE t MODIFY COLUMN s.b \u003ctype\u003e FIRST/AFTER a`\n|\n| Change a required field to nullable | `ALTER TABLE t ALTER COLUMN s.b\nDROP NOT NULL` | `ALTER TABLE t MODIFY COLUMN s.b \u003ctype\u003e NULL` |\n| Evolve a map key | Not supported by Iceberg | Not supported |\n\nUser-visible behavior and boundaries:\n\n- A newly added Iceberg nested field must be nullable; adding a required\nnested field is not supported by Iceberg.\n- `MODIFY COLUMN` only applies primitive type promotions accepted by the\nIceberg Java API. Unsupported conversions fail without committing a\npartial schema change.\n- Omitting `NULL`/`NOT NULL` preserves existing requiredness. Use an\nexplicit `NULL` on the exact nested path to change a required field to\noptional; modifying a whole complex column no longer infers recursive\nrequired-to-optional changes.\n- Omitting `COMMENT` preserves the existing Iceberg field documentation.\n`COMMENT \u0027\u0027` explicitly clears it where Iceberg supports field comments.\n- Nested default values are out of scope for this PR. No nested\ndefault-value materialization or write-side default behavior is added.\n- Collection pseudo-fields such as `arr.element`, `map.key`, and\n`map.value` can be addressed for supported type changes, but Iceberg\ndoes not persist comments directly on those pseudo-fields.\n- ALTER validation now resolves the target table before\ntable-type-specific operation validation. If both the table and clause\nare invalid, the missing-table error is reported first.\n- Replayed schema-change SQL keeps ordinary struct field names unquoted\nand quotes reserved or special identifiers only when required.\n\nNot included in this PR:\n\n1. Nested default-value support.\n2. Iceberg v3 promotions `unknown -\u003e any` and `date -\u003e\ntimestamp/timestamp_ns`; the current Iceberg Java API used by Doris does\nnot expose these promotions.\n\n### Release note\n\nSupport Iceberg nested column schema evolution through `ALTER TABLE`,\nincluding add, drop, rename, comment, position, nullability relaxation,\nand legal primitive type promotion operations.\n\n### Check List (For Author)\n\n- Test\n    - [x] Regression test\n- `external_table_p0/iceberg/test_iceberg_nested_schema_evolution_ddl`\n-\n`external_table_p0/iceberg/test_iceberg_nested_schema_evolution_spark_doris_interop`\n        - `external_table_p0/iceberg/iceberg_schema_change_ddl`\n    - [x] Unit Test\n        - `IcebergMetadataOpsValidationTest`\n        - `IcebergNestedSchemaEvolutionParserTest`\n        - `PruneNestedColumnTest`\n    - [ ] Manual test\n    - [ ] No need to test or manual test\n\n- Behavior changed:\n    - [ ] No.\n- [x] Yes. Iceberg nested schema changes are supported, omitted\nnullability/comment clauses preserve existing metadata, and\nmissing-table validation takes precedence over clause validation.\n\n- Does this need documentation?\n    - [ ] No.\n- [x] Yes. A follow-up documentation PR is required for the new nested\nschema-change syntax and behavior boundaries.\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label"
    },
    {
      "commit": "d921bebcd90a3f55a2446a706d350f2624445c62",
      "tree": "f6eeaa320ece66a3ffe0912280dcf07226f869a6",
      "parents": [
        "f2f2a5701f0de5c140debe6812019a03f07549d1"
      ],
      "author": {
        "name": "feiniaofeiafei",
        "email": "moailing@selectdb.com",
        "time": "Thu Jul 23 11:44:59 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 11:44:59 2026 +0800"
      },
      "message": "[fix](agg) Normalize equivalent multi-column distinct counts (#65206)\n\n### What problem does this PR solve?\n\nRelated PR: #54079\n\nProblem Summary: NormalizeAggregate treated reordered equivalent\nmulti-column COUNT DISTINCT expressions as different aggregate\nfunctions, which could make the multi-distinct strategy construct a join\nfrom only one aggregate and fail with an index-out-of-bounds exception.\nIt also retained duplicate arguments even though downstream COUNT\nDISTINCT processing treats them as a set, and the CountIf conversion\nindexed the original argument list using the deduplicated size.\nCanonicalize multi-column distinct arguments as an ordered set during\nnormalization and make CountIf consume that deduplicated list directly.\n\n### Release note\n\nFix planning and execution of equivalent multi-column COUNT DISTINCT\nexpressions with reordered or duplicate arguments."
    },
    {
      "commit": "f2f2a5701f0de5c140debe6812019a03f07549d1",
      "tree": "bca5a9206aff4fbc421d057582ccb0d11c28bcb9",
      "parents": [
        "cc9491a751d920d5ae970aab5b1fa6d6af686283"
      ],
      "author": {
        "name": "TengJianPing",
        "email": "tengjianping@selectdb.com",
        "time": "Thu Jul 23 11:13:26 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 11:13:26 2026 +0800"
      },
      "message": "[fix](be) Fix waiter accounting in blocking queue (#65827)\n\n### What problem does this PR solve?\n\nIssue Number: None\n\nProblem Summary: \nBlockingQueue split ownership of waiter counters between waiting threads\nand notifying threads. Around a timeout race, both sides could decrement\nthe same registration, while a spurious wakeup could leave a stale\nregistration behind. Once the counters drifted, a real waiter could be\nrepresented by zero and miss a notification, leaving routine load queue\noperations blocked until the one-hour fallback timeout. This is a missed\nwakeup and potentially long stall, rather than a permanent deadlock. In\naddition, try_put read the shutdown flag and queue size before acquiring\nthe mutex, which introduced a data race.\n\nThe following sequence shows how lock contention after a timeout\ncorrupts the consumer waiter count:\n\n| Time | Consumer threads | Producer threads | get_waiting | Result |\n| --- | --- | --- | ---: | --- |\n| T0 | C increments the counter and calls wait_for, which releases the\nmutex. | - | 0 -\u003e 1 | C is registered as a condition-variable waiter. |\n| T1 | C\u0027s timeout expires. C leaves the condition-variable wait set,\nbut wait_for must reacquire the mutex before returning. | P acquires the\nmutex before C can reacquire it. | 1 | C\u0027s registration is still visible\nbecause C cannot update it without the mutex. |\n| T2 | C remains blocked while trying to reacquire the mutex. | P pushes\nan item, decrements the counter from 1 to 0, unlocks, and calls\nnotify_one. | 1 -\u003e 0 | The notification cannot wake C because its timed\nwait has already expired. |\n| T3 | C reacquires the mutex. wait_for returns timeout, so the old\ntimeout branch decrements the same registration again. C then consumes\nthe item. | - | 0 -\u003e SIZE_MAX | The unsigned waiter count wraps around.\n|\n| T4 | After the queue becomes empty, a new consumer C2 increments the\ncounter and starts waiting. | - | SIZE_MAX -\u003e 0 | A real waiter is now\nrepresented by zero. |\n| T5 | Absent a spurious wakeup, shutdown, or another notification, C2\nremains asleep until its next timeout, which is one hour by default. |\nP2 pushes an item, observes zero, and skips notify_one. | 0 | C2 suffers\na missed wakeup and a potentially one-hour stall. |\n\nMake each waiting thread register immediately before wait_for and always\nunregister after wait_for returns, regardless of the wakeup reason.\nNotifiers now inspect waiter counters under the mutex without consuming\nregistrations, then release the mutex before notifying. Remove the\nunsynchronized try_put fast path and add deterministic tests for both\nconsumer and producer timeout races.\n\n### Release note\n\nFix potential long stalls in backend blocking queues under concurrent\ntimeout and notification.\n\n### Check List (For Author)\n\n- Test: Unit Test\n- Added BlockingQueueWaiterTest.TimedGetNotificationRace and\nBlockingQueueWaiterTest.TimedPutNotificationRace.\n    - `./run-be-ut.sh --run --filter\u003d\u0027BlockingQueueWaiterTest.*\u0027 -j 32`\n- Behavior changed: Yes. Waiter registrations are now released by the\nwaiting thread, and try_put checks queue state only while holding the\nmutex.\n- Does this need documentation: No"
    },
    {
      "commit": "cc9491a751d920d5ae970aab5b1fa6d6af686283",
      "tree": "d1541c3c6f15f690b3242a1e793485a88a600565",
      "parents": [
        "0c2a7b5a7cf3a2c4bf74d3e99341de98fa5fd01f"
      ],
      "author": {
        "name": "zclllyybb",
        "email": "zhaochangle@selectdb.com",
        "time": "Thu Jul 23 11:10:17 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 11:10:17 2026 +0800"
      },
      "message": "[fix](regression) Tolerate dynamic partition scheduler race (#65902)\n\nThe regression case can overlap with a dynamic-partition scheduler run\nthat was started while the global interval was one second. Restoring the\ninterval to 600 seconds does not cancel that in-flight run. If it drops\nthe out-of-range `p1900` partition while the multi-row INSERT is\ncommitting, the INSERT can fail even though the scheduler reached its\nintended final state.\n\nThe prior workaround depended on an exact exception message and issued\n`show partitions` inside a `TestAction` check closure. That nested DSL\ncall changes the action current SQL, making the failure report and its\nstring-based classification brittle.\n\nThis change drains the likely preceding short-interval run before\nINSERT, captures the INSERT exception without nested DSL calls, and\ntolerates only the known terminal state: `p1900` is absent while the\n`2024`, `2900`, and `3000` auto-created partitions remain. All other\nfailures still fail the case. The intentional one-second cleanup\ninterval is restored to 600 seconds in `finally`."
    },
    {
      "commit": "0c2a7b5a7cf3a2c4bf74d3e99341de98fa5fd01f",
      "tree": "8764616d41bef32528b1d945b3f30244163726be",
      "parents": [
        "4016780fca4ed17b6f5fdf6cfb616c3e18dcabcd"
      ],
      "author": {
        "name": "meiyi",
        "email": "meiyi@selectdb.com",
        "time": "Thu Jul 23 11:02:14 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 11:02:14 2026 +0800"
      },
      "message": "[fix](be) Backpressure async group commit by table WAL count (#65362)\n\nProblem Summary: Async group commit WAL replay failures can leave table\nWAL backlog growing while later stream loads continue to be admitted.\nThis adds a per-table group commit WAL count limit, tracks WAL queue\nsize in WalManager, records replay failure reasons for diagnostics, and\nrejects new async group commit loads once the backlog reaches the\nconfigured limit."
    },
    {
      "commit": "4016780fca4ed17b6f5fdf6cfb616c3e18dcabcd",
      "tree": "c0b24baa5aa8631dff308188123ee5ea82c0fafe",
      "parents": [
        "9ec7f76d15efc2df544b8ba6c1afdee76fb9d8c4"
      ],
      "author": {
        "name": "zhangstar333",
        "email": "zhangsida@selectdb.com",
        "time": "Thu Jul 23 10:42:03 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 10:42:03 2026 +0800"
      },
      "message": "[fix](paimon) Handle special characters in partition values (#65904)\n\nPaimon `Partition.spec()` returns logical partition values. Doris\npreviously concatenated them into a Hive-style partition path and parsed\nit with `HiveUtil.toPartitionValues()`.\n\nWhen a partition value contains `/`, `\u003d`, or other path-related\ncharacters, it can be parsed incorrectly, causing invalid partition\nmetadata, repeated warning logs, and potentially\n  increased planning latency."
    },
    {
      "commit": "9ec7f76d15efc2df544b8ba6c1afdee76fb9d8c4",
      "tree": "62655cde51462e6bb1009bf7de74cd874711f0a1",
      "parents": [
        "cbf851ddb91cf57b4af8a1c01b61a69080eeaf35"
      ],
      "author": {
        "name": "Gabriel",
        "email": "liwenqiang@selectdb.com",
        "time": "Thu Jul 23 08:01:50 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 23 08:01:50 2026 +0800"
      },
      "message": "[bench](parquet) Add native reader microbenchmark matrix (#65921)\n\n## What problem does this PR solve?\n\nAdds a reproducible local microbenchmark foundation for the native\nParquet reader described in the [benchmark\ndesign]. It\nseparates page decoder cost from end-to-end local reader cost and keeps\nfixture construction and encoding validation outside measured regions.\n\n## What is changed?\n\n- Add 152 native decoder cases covering PLAIN, dictionary,\nbyte-stream-split, and DELTA encodings across fixed-width and binary\nphysical types, with clustered and alternating sparse selections.\n- Add 137 local reader cases covering open-to-first-block, full scan,\npredicate scan, LIMIT shapes, null density/pattern, predicate\nselectivity, projection mode, schema width, predicate position, and four\nphysical encodings.\n- Generate deterministic fixtures lazily and validate every row group\nand column encoding from Parquet footer metadata before measuring.\n- Report rows/s, bytes/s, raw/selected rows, fixture size, ns/raw row,\nand ns/selected row.\n- Add matrix constraint tests and local build/run documentation.\n\n## Check List\n\n- [x] Release benchmark target builds with `ninja -C be/build_RELEASE\n-j128 benchmark_test`\n- [x] Scenario matrix tests pass (4/4)\n- [x] All 152 decoder benchmarks pass\n- [x] All 137 reader benchmarks pass\n- [x] clang-format 16 and `git diff --check` pass"
    },
    {
      "commit": "cbf851ddb91cf57b4af8a1c01b61a69080eeaf35",
      "tree": "ff90f82113d49e5089c50628e183f50d7660ab53",
      "parents": [
        "a8b1fd95e2a0e08f1b1f88c8c34b244b85ea96a7"
      ],
      "author": {
        "name": "Yixuan Wang",
        "email": "wangyixuan@selectdb.com",
        "time": "Wed Jul 22 21:53:25 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jul 22 21:53:25 2026 +0800"
      },
      "message": "[fix](cloud) Preserve resource IDs when recycling empty rowsets (#65862)\n\n### What problem does this PR solve?\n\nIssue Number: close #xxx\n\nRelated PR: https://github.com/apache/doris/pull/64630\n\nProblem Summary: When a dropped tablet contains only empty rowsets, the\nrecycler\nskips every rowset before collecting its resource ID. As a result,\nresource_ids remains empty and no tablet directory deletion is\nscheduled.\nMetadata is then removed and the tablet is marked as recycled, leaving\ntemporary objects from failed writes orphaned in object storage.\n\nAllow empty rowsets with a valid resource ID to participate in\ntablet-level\ndirectory cleanup, while continuing to skip empty rowsets without a\nresource\nID and reject non-empty rowsets with a missing resource ID."
    },
    {
      "commit": "a8b1fd95e2a0e08f1b1f88c8c34b244b85ea96a7",
      "tree": "f39a4a60b3e234ed2e8a39b5e18539d08a65bbfd",
      "parents": [
        "f45e0931636f56ded31e332302423356837f840a"
      ],
      "author": {
        "name": "Gabriel",
        "email": "liwenqiang@selectdb.com",
        "time": "Wed Jul 22 20:22:57 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jul 22 20:22:57 2026 +0800"
      },
      "message": "[improvement](file) Build native Parquet scan path for FileScannerV2 (#65674)\n\n### What problem does this PR solve?\n\nIssue Number: None\n\nRelated PR: None\n\nFileScannerV2\u0027s Parquet path previously relied on Arrow-oriented\nmetadata and value adapters, built intermediate decoded objects before\nmaterializing Doris columns, and lacked a single V2-owned contract for\nschema mapping, predicate execution, sparse reads, cache identity, and\nmetadata aggregation. This duplicated decode/conversion work, retained\nlarge temporary buffers for nested and string columns, and made nullable\nsparse scans degrade into many small read/skip operations.\n\nThis PR builds an independent native Parquet scan path under\n`be/src/format_v2`. The V2 path owns its metadata tree, page/level/value\ndecoders, predicate scheduling, aggregate metadata handling, and\nDoris-column materialization. It does not call the legacy V1 Parquet\nreader or modify the V1 implementation under `be/src/format`.\n\n### What is changed?\n\n#### Native metadata, planning, and validation\n\n- Replace the metadata adapter with a native metadata tree shared by\nschema mapping, row-group planning, statistics, dictionary pruning,\nBloom filters, page indexes, and scan readers.\n- Preserve compatible LIST/MAP/STRUCT wrapper semantics while validating\nmalformed schema nodes, primitive-only group annotations, physical\ntypes, flat-leaf value counts, offsets, sizes, page headers, levels,\ndictionary IDs, and index metadata before use.\n- Keep schema evolution and table/file type reconciliation in the V2\n`ColumnMapper` expression layer.\n\n#### Native decoding and direct materialization\n\n- Decode Page V1/V2 definition and repetition levels and PLAIN, BOOLEAN\nRLE, dictionary, DELTA_BINARY_PACKED, DELTA_LENGTH_BYTE_ARRAY,\nDELTA_BYTE_ARRAY, and BYTE_STREAM_SPLIT value streams.\n- Keep physical decoding in native decoders and physical-to-logical\nconversion in `DataTypeSerDe`, appending directly into final Doris\ncolumns.\n- Reconstruct ARRAY/MAP/STRUCT values from native levels and child\nreaders; levels-only and COUNT scans avoid unused payload\nmaterialization.\n- Remove obsolete Arrow RecordReader/builders, metadata adapters,\ncompatibility overloads, and I/O shims from the V2 chain.\n\n#### Nullable sparse, dictionary, and predicate optimization\n\n- Normalize sparse selections into selected non-null physical ranges,\ndecode them in batches, and restore nullable logical positions\nseparately for primitive, DECIMAL, DATE/DATETIME, string/binary, and\ndictionary columns.\n- Materialize typed dictionaries once per generation, reuse them for\npruning/filtering/output, and select cache-aware direct or\ncompact-gather paths according to dictionary and selection size.\n- Evaluate eligible predicate-only PLAIN `INT`, `BIGINT`, `FLOAT`, and\n`DOUBLE` comparisons directly on physical values without materializing\nand compacting a Doris predicate column.\n- Retain predicate row mappings until a required\nmulti-column/delete/output boundary and let fully filtered physical\nbatches grow independently of final output block size.\n- Reuse bounded decoder, level, selection, null-map, binary, and nested\nscratch buffers.\n\n#### Metadata aggregate pushdown\n\n- Carry semantic COUNT arguments from FE only for FileScannerV2, keeping\nV1 file scans and internal-table planning on their existing paths.\n- Support exact `COUNT(nullable_col)` using definition levels while\nrejecting mixed `COUNT(*)` plus `COUNT(nullable_col)` pushdown when one\nmetadata cardinality cannot represent both aggregates.\n- Emit synthetic metadata COUNT rows in runtime-sized batches instead of\nallocating a block proportional to file cardinality.\n- Use footer bounds for MIN/MAX aggregation only when explicit exactness\nflags permit it; legacy files with absent flags retain compatible\nbehavior.\n- Fall back to normal scanning when aggregate metadata cannot prove an\nexact result.\n\n#### I/O, cache, and observability\n\n- Use V2-owned footer/page caches and stable file identities, coalesce\nsafe remote ranges, and keep decoder/dictionary mutable state outside\nshared cache entries.\n- Add counters for native decoding, sparse selections and NULL fallback,\npredicate compaction, direct predicates, dictionary filtering, adaptive\nbatching, cache activity, aggregate reads, and retained scratch.\n- Update the FileScannerV2 design and review documents for native\nmetadata, decoding, selection, predicate, cache, fallback, and ownership\ncontracts.\n\n### Correctness and compatibility\n\n- Unsupported encodings, conversions, nested optimization boundaries,\nmixed dictionary/plain chunks, incomplete indexes, or unverifiable\nmetadata disable only the relevant optimization or return a bounded\ncorruption error before decoder cursors are consumed.\n- Missing statistics, dictionaries, Bloom filters, page indexes, or\nstable cache identities never remove candidate rows.\n- All optimization fallbacks stay within V2; the legacy V1\nimplementation remains unchanged.\n\n### Release note\n\nFileScannerV2 Parquet scans now use a V2-owned native metadata and\ndecoding pipeline with direct Doris-column materialization, nullable\nsparse decoding, cache-aware dictionaries, direct physical predicates,\nand bounded exact metadata aggregation.\n\n### Check List (For Author)\n\n- Test: Unit Test\n  - Review-fix BE tests: 4/4 passed.\n- Related BE suites (`TableReaderTest`, `NativeParquetStatisticsTest`,\nand `ParquetSchemaTest`): 102/102 passed under ASAN.\n  - FE `PhysicalStorageLayerAggregateTest`: 6/6 passed with checkstyle.\n  - Clang-format 16 and `git diff --check`: passed.\n- Behavior changed: Yes. FileScannerV2 uses the native V2 Parquet scan\nchain and exact, bounded metadata aggregate paths.\n- Does this need documentation: Yes. The FileScannerV2 Parquet design\nand review documents are updated in this PR."
    },
    {
      "commit": "f45e0931636f56ded31e332302423356837f840a",
      "tree": "c4c8081dd0d0dab2ffbc14e099afffd0d724cf5c",
      "parents": [
        "ab043a433cef0246bd81fbe76011a83b9a13d17b"
      ],
      "author": {
        "name": "Pxl",
        "email": "pxl290@qq.com",
        "time": "Wed Jul 22 20:16:34 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jul 22 20:16:34 2026 +0800"
      },
      "message": "[fix](runtime-filter) Preserve query semantics during partition pruning (#65857)\n\nWith runtime filter partition pruning enabled, a join target expression\nwas eligible for partition pruning even when it contained a non-movable\nfunction such as `assert_true`. The pruning path then evaluated\n`assert_true` against LIST partition boundary values before normal row\npredicates ran. A boundary belonging only to filtered-out rows could\ntherefore introduce an `INVALID_ARGUMENT` error, while the same query\nreturned `1` with partition pruning disabled.\n\nThe FE runtime-filter partition-prune classifier now rejects target\nexpressions containing `NoneMovableFunction`. This prevents speculative\npartition-boundary evaluation of `assert_true`; both partition-pruning\nmodes preserve normal query semantics and return `1`."
    },
    {
      "commit": "ab043a433cef0246bd81fbe76011a83b9a13d17b",
      "tree": "d17dcc0b9b01884084f4180a6b4dfca154d9b1a3",
      "parents": [
        "a9816faa6652362b4fc8fbbf057d62652e016624"
      ],
      "author": {
        "name": "wudi",
        "email": "wudi@selectdb.com",
        "time": "Wed Jul 22 16:41:31 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jul 22 16:41:31 2026 +0800"
      },
      "message": "[fix](fe) Normalize MySQL JDBC URLs for streaming jobs (#65647)\n\n### What problem does this PR solve?\n\nMySQL Connector/J interprets TINYINT(1) as BOOLEAN and\nYEAR as a date unless JDBC URL options override those defaults.\nStreaming jobs did not normalize the URL consistently across the FROM\nMYSQL and cdc_stream TVF paths, which could change automatically\ndiscovered Doris column types and YEAR zero values. This change adds a\nshared streaming JDBC URL normalizer and applies it before metadata\ndiscovery and CDC reads. MySQL streaming URLs now use\nyearIsDateType\u003dfalse, tinyInt1isBit\u003dfalse, useUnicode\u003dtrue, and\ncharacterEncoding\u003dutf-8. PostgreSQL URLs remain unchanged, and\nrewriteBatchedStatements is intentionally not added because streaming\ningestion writes through Stream Load."
    },
    {
      "commit": "a9816faa6652362b4fc8fbbf057d62652e016624",
      "tree": "87a7351fa568e4086edf20a7ef9f95dce6e0e75f",
      "parents": [
        "e904fbe7d9c13c49f8bfa690d90ea963630e570d"
      ],
      "author": {
        "name": "deardeng",
        "email": "dengxin@selectdb.com",
        "time": "Wed Jul 22 14:26:19 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jul 22 14:26:19 2026 +0800"
      },
      "message": "[refactor](cloud) Rename cloud compute group metadata class (#65817)\n\nProblem Summary: The FE cloud catalog metadata entity shared the\nComputeGroup class name with the resource routing abstraction, making\nimports and usages ambiguous. Rename the cloud metadata entity to\nCloudComputeGroupMeta and update all production and unit-test references\nwithout changing runtime behavior or persisted metadata."
    },
    {
      "commit": "e904fbe7d9c13c49f8bfa690d90ea963630e570d",
      "tree": "e03b34d2f5136091ac7b09da3e2927471f3f3081",
      "parents": [
        "88763c118581fa8bc207fccbdf6143fb5380b6a6"
      ],
      "author": {
        "name": "HappenLee",
        "email": "happenlee@selectdb.com",
        "time": "Wed Jul 22 11:51:23 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jul 22 11:51:23 2026 +0800"
      },
      "message": "[fix](be) Fix array_difference out-of-bounds read on single-element array (#65855)\n\n### What problem does this PR solve?\n\nIssue Number: None\n\nProblem Summary:\nFunctionArrayDifference::impl read src[begin+1] before checking whether\nbegin+1 \u003c end. For a single-element array (begin+1 \u003d\u003d end) this accessed\none element past the valid range, causing undefined behavior.\n\nThe fix rewrites the loop so that it only reads src[curr_pos] when\ncurr_pos \u003c end, while preserving the existing semantics.\n\nThis change also adds regression cases for single-element arrays and\nfor int32 overflow scenarios (e.g. [INT32_MAX, INT32_MIN]) to ensure\narray_difference widens to bigint before subtraction.\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test: Regression test\n- Behavior changed: No\n- Does this need documentation: No"
    },
    {
      "commit": "88763c118581fa8bc207fccbdf6143fb5380b6a6",
      "tree": "f25018362c844839ca31860e10189a62775f4645",
      "parents": [
        "aa559d48dc9c346595dc3a713ee321df7fbad127"
      ],
      "author": {
        "name": "heguanhui",
        "email": "hgh_wy163mail@163.com",
        "time": "Wed Jul 22 10:37:58 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jul 22 10:37:58 2026 +0800"
      },
      "message": "[fix](fe) Remove oidc plugin assertions from AuthenticationPluginManagerTest (#63323)\n\n### What problem does this PR solve?\n\nIssue Number: close #xxx\n\nProblem Summary: `AuthenticationPluginManagerTest` asserts the existence\nof an OIDC authentication plugin (`oidc`), but no OIDC plugin\nimplementation exists in the codebase. The `OidcPluginFactory` class\nreferenced in `AuthenticationPluginFactory.java` comments is only a\ndocumentation example with no actual implementation, module, or SPI\nregistration. This causes `testPluginsAutoLoaded` and `testGetFactory`\nto fail.\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test\n    - [x] Unit Test\n- Behavior changed:\n    - [x] No.\n- Does this need documentation?\n    - [x] No."
    },
    {
      "commit": "aa559d48dc9c346595dc3a713ee321df7fbad127",
      "tree": "6c118ced5eb99771b809b02970759f66cd8fd77c",
      "parents": [
        "1b0217a7c8344b6a0dab283bad1c7fd4f9e18ec7"
      ],
      "author": {
        "name": "Mingyu Chen (Rayner)",
        "email": "yunyou@selectdb.com",
        "time": "Wed Jul 22 10:35:53 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jul 22 10:35:53 2026 +0800"
      },
      "message": "[fix](workflow) Fix BE UT macOS job failing with wrong JDK version (#65796)\n\n### What problem does this PR solve?\n\nIssue Number: close #xxx\n\nRelated PR: #58914\n\nProblem Summary:\n\nThe `BE UT (macOS)` job fails at the JDK version check, e.g. in\n\nhttps://github.com/apache/doris/actions/runs/29692786375/job/88208347509:\n\n```\nCheck JAVA version\nERROR: The JAVA version is 25, it must be JDK-17.\n```\n\nRoot cause: #58914 migrated the runner from the deprecated `macos-13`\n(Intel / x64) to `macos-15`, which is Apple Silicon (arm64). GitHub\u0027s\nrunner\nimages expose the JDK path as `JAVA_HOME_\u003cversion\u003e_\u003carch\u003e`, so on arm64\nit is\n`JAVA_HOME_17_arm64`, and the old `JAVA_HOME_17_X64` is unset. The\nworkflow\nstill read `JAVA_HOME_17_X64`, so `JAVA_HOME` became empty; the build\nthen fell\nback to the system default Java (25) and `check_jdk_version` failed.\n\nFix: read `JAVA_HOME_17_arm64` with a fallback to `JAVA_HOME_17_X64`,\nkeeping\nthe job working on Apple Silicon runners while staying compatible with\nIntel\nones. The trailing-slash stripping of the original is preserved.\n\nNote: this workflow\u0027s paths-filter only triggers on `be/**` /\n`gensrc/**`\nchanges, so this PR (which only edits the workflow) will not run the\nmacOS\nBE UT job itself. The fix will be exercised by the next scheduled run or\nthe\nnext PR touching BE code. The env-var resolution was verified locally\nfor both\narm64 and x64 cases."
    },
    {
      "commit": "1b0217a7c8344b6a0dab283bad1c7fd4f9e18ec7",
      "tree": "121877bfafcc7b717a9b318db37a1914f9f4a996",
      "parents": [
        "6875132c3d2b0416d95ef6849658fb5959331735"
      ],
      "author": {
        "name": "Gabriel",
        "email": "liwenqiang@selectdb.com",
        "time": "Wed Jul 22 09:21:56 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jul 22 09:21:56 2026 +0800"
      },
      "message": "[fix](iceberg) Honor partial name mappings for legacy files (#65784)\n\n### What problem does this PR solve?\n\nIssue Number: N/A\n\nRelated PR: N/A\n\nProblem Summary: Legacy Iceberg Parquet and ORC files without field IDs\nare resolved through name mapping. With a partial mapping, FE omitted\nthe optional mapping metadata for unmapped fields, so both V1 and V2 BE\nreaders fell back to the current physical name and read unrelated data.\n\nThis change preserves table-level mapping presence by transporting\nexplicit empty per-field lists and makes those lists authoritative in\nboth readers. Unmapped fields now materialize their default or NULL,\nwhile mapped aliases and scans without name mapping retain their\nexisting behavior.\n\n### Release note\n\nFix reads of migrated Iceberg files with partial name mapping so fields\nomitted from the mapping materialize their default value or NULL instead\nof matching a physical column by the current name.\n\n### Check List (For Author)\n\n- Test: Unit Test\n    - `ExternalUtilTest` (6 tests)\n    - Focused V1 and V2 name-mapping BE unit tests under ASAN (10 tests)\n- Behavior changed: Yes. Partial Iceberg name mappings are now\nauthoritative for legacy files without field IDs.\n- Does this need documentation: No"
    },
    {
      "commit": "6875132c3d2b0416d95ef6849658fb5959331735",
      "tree": "dd00eff769a73f871a5e06028e9a7f143568b0f5",
      "parents": [
        "25bdeb83e769eb162f9ecdcc03135894dbb00ffd"
      ],
      "author": {
        "name": "HappenLee",
        "email": "happenlee@selectdb.com",
        "time": "Tue Jul 21 18:03:40 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jul 21 18:03:40 2026 +0800"
      },
      "message": "[improvement](be) Optimize bit unpacking with PDEP (#65738)\n\n### What problem does this PR solve?\n\nIssue Number: None\n\nRelated PR: None\n\nProblem Summary:\n\n`BitPacking::UnpackValues` currently decodes every complete 32-value\nbatch with scalar shifts and masks. This PR adds an x86 PDEP\nimplementation for `uint8_t`, `uint16_t`, and `uint32_t`, with AVX2\nwidening for narrow `uint16_t` and `uint32_t` values.\n\nThe optimized functions are compiled with `target(\"bmi2,avx2\")`. Runtime\ndispatch uses `__builtin_cpu_supports(\"bmi2\")` and\n`__builtin_cpu_supports(\"avx2\")`; unsupported CPUs, architectures,\noutput types, and remainder values continue to use the generic scalar\nimplementation. Doris therefore does not require a global `-mbmi2` build\nflag.\n\nProduction dispatch deliberately uses PDEP only when `BIT_WIDTH \u003c 16`.\nThe generic PDEP implementation remains available to the benchmark for\nall supported widths, but repeated measurements found non-monotonic,\nCPU- and working-set-dependent results at 16-32 bits. A detailed code\ncomment explains why the scalar implementation is retained for these\nwidths instead of adding an irregular CPU-specific allowlist.\n\nThe PR also adds unit tests for every supported bit width, including\nfull batches and truncated/remainder input, plus a reproducible\nbenchmark comparing:\n\n- the generic scalar kernel;\n- the direct PDEP kernel;\n- the actual `BitPacking::UnpackValues` dispatch path.\n\nThe benchmark covers 4K, 256K, and 1M values and validates optimized\noutput against the scalar reference before measuring.\n\n#### Benchmark results\n\nHardware: Intel Xeon Platinum 8457C, 48 KiB L1D and 2 MiB private L2 per\ncore. Release build pinned to one CPU. Values below are CPU-time medians\nin microseconds.\n\nL1-sized working set: 4K `uint32_t` values, measured with 9 randomized\nrepetitions. Each iteration touches a 16 KiB output and 1-7 KiB of\npacked input; including the benchmark\u0027s reference output, all buffers\ntotal 33-39 KiB and fit in the 48 KiB L1D.\n\n| bit width | scalar | PDEP | speedup |\n| ---: | ---: | ---: | ---: |\n| 2 | 0.628 | 0.338 | 1.86x |\n| 4 | 0.551 | 0.336 | 1.64x |\n| 6 | 0.969 | 0.355 | 2.73x |\n| 8 | 0.224 | 0.221 | 1.01x |\n| 10 | 1.216 | 0.565 | 2.15x |\n| 12 | 0.870 | 0.564 | 1.54x |\n| 14 | 0.828 | 0.737 | 1.12x |\n\nL2-sized working set: 256K `uint32_t` values. The output is 1 MiB and\nthe packed input is 64-448 KiB.\n\n| bit width | scalar | PDEP | speedup |\n| ---: | ---: | ---: | ---: |\n| 2 | 41.0 | 27.9 | 1.47x |\n| 4 | 38.5 | 29.0 | 1.33x |\n| 6 | 61.6 | 28.6 | 2.15x |\n| 8 | 32.5 | 31.5 | 1.03x |\n| 10 | 77.6 | 36.0 | 2.16x |\n| 12 | 51.1 | 35.9 | 1.42x |\n| 14 | 50.6 | 47.1 | 1.07x |\n\nThe result is workload- and bit-width-dependent. PDEP is faster for all\nmeasured widths while the working set remains in L1 or L2, although the\ngains at widths 8 and 14 are marginal. At 1M values, the actual PDEP\npath is faster than the scalar path at widths 10, 12, and 14, but slower\nat widths 2, 4, 6, and 8.\n\n#### High-width dispatch validation\n\nWidths 16-32 were measured twice independently. The following table is\nthe second run with 9 randomized repetitions; `scalar/PDEP` greater than\n1 means PDEP is faster.\n\n| bit width | 256K scalar | 256K PDEP | scalar/PDEP | 1M scalar | 1M\nPDEP | scalar/PDEP |\n| ---: | ---: | ---: | ---: | ---: | ---: | ---: |\n| 16 | 27.84 | 55.89 | 0.498x | 234.49 | 273.85 | 0.856x |\n| 17 | 62.60 | 60.04 | 1.043x | 253.42 | 255.63 | 0.991x |\n| 18 | 65.04 | 56.52 | 1.151x | 263.04 | 258.88 | 1.016x |\n| 19 | 77.90 | 59.92 | 1.300x | 313.56 | 261.44 | 1.199x |\n| 20 | 63.27 | 55.17 | 1.147x | 283.19 | 264.64 | 1.070x |\n| 21 | 67.23 | 58.28 | 1.154x | 277.33 | 269.93 | 1.027x |\n| 22 | 70.41 | 57.38 | 1.227x | 284.12 | 271.71 | 1.046x |\n| 23 | 68.79 | 68.49 | 1.004x | 281.42 | 289.78 | 0.971x |\n| 24 | 73.75 | 56.84 | 1.297x | 342.26 | 278.40 | 1.229x |\n| 25 | 67.05 | 81.08 | 0.827x | 291.52 | 330.73 | 0.881x |\n| 26 | 69.09 | 79.70 | 0.867x | 290.12 | 324.68 | 0.894x |\n| 27 | 66.78 | 61.01 | 1.095x | 285.06 | 293.61 | 0.971x |\n| 28 | 63.18 | 75.05 | 0.842x | 290.85 | 310.27 | 0.937x |\n| 29 | 64.87 | 58.74 | 1.104x | 295.88 | 304.93 | 0.970x |\n| 30 | 64.47 | 60.25 | 1.070x | 303.09 | 307.15 | 0.987x |\n| 31 | 53.94 | 66.08 | 0.816x | 316.73 | 310.85 | 1.019x |\n| 32 | 42.03 | 71.03 | 0.592x | 297.63 | 333.45 | 0.893x |\n\nThe first independent run showed the same material regressions at widths\n16, 25, 26, 28, and 32. The `uint16_t` width-16 boundary was also slower\nwith PDEP at 4K, 256K, and 1M values. Some high widths improve, but the\nprofitable set changes with the working set and has no monotonic\nboundary. The production path therefore conservatively retains scalar\nunpacking for all widths at or above 16. After this change, the actual\nentry point was benchmarked again: width 15 follows direct PDEP, while\nwidths 16-32 follow the scalar path and all outputs match the scalar\nreference.\n\n### Release note\n\nImprove bit-packed integer decoding for bit widths below 16 on BMI2 and\nAVX2 capable x86 CPUs.\n\n### Check List (For Author)\n\n- Test:\n- [x] Unit Test: `./run-be-ut.sh -j 48 --run --filter\u003dBitPackingTest.*`\n- [x] Manual test: compiled and linked `benchmark_test`; ran scalar,\ndirect PDEP, and actual-path cases at 4K, 256K, and 1M values\n    - [x] `build-support/check-format.sh`\n- [x] `build-support/run-clang-tidy.sh --build-dir be/build_Release\n--files be/benchmark/benchmark_main.cpp\nbe/test/util/bit_packing_test.cpp`\n- Behavior changed: Yes. Supported x86 CPUs use PDEP for complete\n32-value batches only when `BIT_WIDTH \u003c 16`; all other cases retain the\nscalar path.\n- Does this need documentation: No\n\n---------\n\nCo-authored-by: Dongyang Li \u003clidongyang@selectdb.com\u003e"
    },
    {
      "commit": "25bdeb83e769eb162f9ecdcc03135894dbb00ffd",
      "tree": "34adc155144d8ec9d3aab759fb593de7b37d32ff",
      "parents": [
        "363c317fc3bec5b49b9810b735c803b63fc2dcdb"
      ],
      "author": {
        "name": "Gabriel",
        "email": "liwenqiang@selectdb.com",
        "time": "Tue Jul 21 16:07:56 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jul 21 16:07:56 2026 +0800"
      },
      "message": "[fix](iceberg) Honor disabled write metrics (#65782)\n\n## Summary\n\n- honor Iceberg table metrics configuration when Doris creates data-file\nmetadata\n- omit all column metrics whose effective Iceberg metrics mode is `none`\n- omit bounds for `counts` and safely truncate string/binary bounds for\n`truncate(N)`\n- build the table metrics policy once per commit batch instead of once\nper output file\n- return unknown Iceberg column statistics when required file metrics\nare absent\n- preserve unknown statistics when closing an Iceberg file scan fails\n- add regression coverage for the write-to-statistics path\n\n## Root cause\n\nDoris collected column statistics in the backend and copied every\nstatistics map into the Iceberg `DataFile` manifest in the frontend. The\nconversion never consulted the table\u0027s `MetricsConfig`, so metadata for\ndisabled columns was persisted even though the physical file statistics\nwere available only as an implementation detail. The initial filtering\nalso treated every non-`none` mode like `full`, retaining bounds for\n`counts` and failing to safely truncate string/binary bounds for\n`truncate(N)`.\n\nAfter disabled metrics were correctly omitted, the downstream Iceberg\nstatistics reader still assumed `columnSizes` and `nullValueCounts`\nalways contained every column. It could therefore throw while loading\nstatistics for a table using metrics mode `none`. A scan-close failure\ncould also override an in-flight unknown-statistics return and expose\npartial or fabricated accumulators. The reader now treats missing\nrequired metrics and scan-close failures as unknown.\n\n## Testing\n\n- `IcebergWriterHelperTest` and `StatisticsUtilTest` (21 tests passed)\n- verified the new review regression tests fail before their production\nfixes and pass after them\n- FE CheckStyle validation (0 violations)\n- `git diff --check`\n\n## Links\n\n- Jira: http://39.106.86.136:8090/browse/DORIS-27023\n- TeamCity reproduction:\nhttp://172.20.48.17:8111/buildConfiguration/Doris_Doris_x64_Master_Trino_Case/201908"
    },
    {
      "commit": "363c317fc3bec5b49b9810b735c803b63fc2dcdb",
      "tree": "d1b04846c04ea4320f929b8cb9ce0d39c3d2471b",
      "parents": [
        "7216a090d8d9ab93e4127a2ffd960c0029ba3c55"
      ],
      "author": {
        "name": "Calvin Kirs",
        "email": "guoqiang@selectdb.com",
        "time": "Tue Jul 21 15:56:47 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jul 21 15:56:47 2026 +0800"
      },
      "message": "[chore](github)update collaborators (#65830)\n\nRemove inactive collaborators"
    },
    {
      "commit": "7216a090d8d9ab93e4127a2ffd960c0029ba3c55",
      "tree": "7c19e26fde1c2bddfa33063e6c284a119474fa06",
      "parents": [
        "6f4134cbe426f2385cf2515c3e409b3c2706c8d7"
      ],
      "author": {
        "name": "daidai",
        "email": "changyuwei@selectdb.com",
        "time": "Tue Jul 21 14:26:03 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jul 21 14:26:03 2026 +0800"
      },
      "message": "[fix](topn) Resolve topn lazy materialization column indexes for aliases (#65759)\n\n### What problem does this PR solve?\nRelated pr #52114\n\nProblem Summary:\nProblem Summary: TopN lazy materialization resolved deferred column\nindexes with output slot names. Queries that renamed Hive columns\ntherefore produced -1 indexes, and external row-id fetch could fill\nthose columns with NULL when positional reading was used. Resolve the\nindex from the already traced original base column so the descriptor and\nindex share one column identity. Add a focused planner unit test and a\nHive ORC Explain regression for the alias path.\n\nFix incorrect NULL values from aliased external-table columns when TopN\nlazy materialization is used."
    },
    {
      "commit": "6f4134cbe426f2385cf2515c3e409b3c2706c8d7",
      "tree": "554e9039bc4026b6326add6ab93124dcc82eebca",
      "parents": [
        "d3b9f5bdbc555ecccf71e18854c562294bd2a02e"
      ],
      "author": {
        "name": "924060929",
        "email": "lanhuajian@selectdb.com",
        "time": "Tue Jul 21 14:03:35 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jul 21 14:03:35 2026 +0800"
      },
      "message": "[fix](test) stabilize the flaky shuffle_left_join regression test (#65769)\n\nThe `nereids_syntax_p0/distribute/shuffle_left_join` regression test is\nflaky.\n\nWith the nereids distribute planner enabled, the join in this suite can\nbe\nplanned either as `INNER JOIN(PARTITIONED)` or as a left-to-right\n`INNER JOIN(BUCKET_SHUFFLE)` — the aggregated left side shuffled onto\nthe right\ntable\u0027s storage buckets — and the two candidates are chosen by cost.\nThat cost\nis not stable during the test:\n\n- the scan row count is reported asynchronously after the insert, so an\nunreported count clamps to 1 (which favors a partitioned shuffle), while\nthe\n  real count favors a bucket shuffle;\n- the bucket-shuffle downgrade gate (`isBucketShuffleDownGrade`) turns\nbucket\n  shuffle back into a partitioned shuffle when the bucket count is small\nrelative to the instance count, so the plan also depends on the number\nof\n  backends.\n\nAs a result the asserted plan flips between `INNER JOIN(PARTITIONED)`\nand\n`INNER JOIN(BUCKET_SHUFFLE)` depending on timing and cluster topology.\n\nThe suite is meant to assert the left-to-right bucket shuffle, which\nsaves one\nexchange versus a plain partitioned shuffle. This PR restores that\nassertion and\nmakes it deterministic:\n\n- `analyze table ... with sync` fixes the row count instead of racing\nthe async\ntablet report (the count is then read from the collected column stats);\n- `set bucket_shuffle_downgrade_ratio\u003d0` removes the dependency on the\nnumber of\n  backends.\n\nVerified on a 4-backend cluster: the suite now stably produces\n`INNER JOIN(PARTITIONED)` for the legacy-planner leg and\n`INNER JOIN(BUCKET_SHUFFLE)` for the distribute-planner leg across\nparallelism\n1/4/8 and both before/after stats reporting; the suite was run 3 times,\nall\ngreen."
    },
    {
      "commit": "d3b9f5bdbc555ecccf71e18854c562294bd2a02e",
      "tree": "8443740a11392f3312dffa96e996b7f0da75b5cc",
      "parents": [
        "ae88ed43c3fea8f5e00928c035ac57d1e475656b"
      ],
      "author": {
        "name": "Dongyang Li",
        "email": "lidongyang@selectdb.com",
        "time": "Tue Jul 21 11:07:15 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jul 21 11:07:15 2026 +0800"
      },
      "message": "[fix](ci) adapt PR checks to safer checkout defaults (#65833)\n\n### What problem does this PR solve?\n\nGitHub now blocks `actions/checkout` from checking out fork PR code in a\n`pull_request_target` workflow by default. As a result, License Check\nand the `pull_request_target` instance of Code Formatter fail during\ncheckout before their actual checks run.\n\nExample failure:\nhttps://github.com/apache/doris/actions/runs/29789987113/job/88526293992?pr\u003d64849\n\nGitHub announcement:\nhttps://github.blog/changelog/2026-06-18-safer-pull_request_target-defaults-for-github-actions-checkout/\n\n### What changed?\n\n- Run License Check for PRs with the unprivileged `pull_request` event.\n- Keep the incremental license configuration generation for PR files.\n- Remove the duplicate `pull_request_target` trigger and checkout path\nfrom Code Formatter.\n- Restrict License Check permissions to read-only access.\n- Upgrade both checkout steps from `actions/checkout@v3` to\n`actions/checkout@v7`.\n\nPush, workflow dispatch, and `run buildall` issue-comment behavior\nremain unchanged.\n\n### Check List\n\n- `actionlint .github/workflows/license-eyes.yml\n.github/workflows/clang-format.yml`\n- `git diff --check`\n\n### Validation note\n\nThis PR uses a same-repository branch so the default branch\u0027s existing\n`pull_request_target` workflows can bootstrap the change. Both the\nexisting target-side runs and the new `pull_request` runs completed\nsuccessfully for License Check and Code Formatter. After this change\nreaches `master`, fork PRs will only use the safe `pull_request`\ndefinitions."
    },
    {
      "commit": "ae88ed43c3fea8f5e00928c035ac57d1e475656b",
      "tree": "6eb86d4df0f6b96453fd70e6d2e8be0f0b10fd47",
      "parents": [
        "f5995f402b322fe65ed59bd62a450275c9d1d28e"
      ],
      "author": {
        "name": "Socrates",
        "email": "suyiteng@selectdb.com",
        "time": "Tue Jul 21 10:52:08 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jul 21 10:52:08 2026 +0800"
      },
      "message": "[fix](regression) Isolate non-catalog Kerberos test database (#65825)\n\nThe Kerberos external regression suites can run concurrently against the\nsame Hive Metastore. Both `test_non_catalog_kerberos` and\n`test_two_hive_kerberos` used the database name `test_krb_hive_db`.\n\nCreating an external HMS database with `IF NOT EXISTS` performs a check\nfollowed by a create operation. When both suites create the database\nconcurrently, both checks can observe that it is absent, and one request\nthen fails with an HMS `AlreadyExistsException`.\n\nUse a dedicated `test_non_catalog_krb_hive_db` database for the\nnon-catalog Kerberos suite. This isolates its fixture from the two-Hive\nsuite without changing the tested export, outfile, HDFS, or Kerberos\nbehavior."
    },
    {
      "commit": "f5995f402b322fe65ed59bd62a450275c9d1d28e",
      "tree": "050767b6d376ecb96c46403120d16833e869db8c",
      "parents": [
        "e51099a9389690784a96e221f74d69bbfee35e73"
      ],
      "author": {
        "name": "hui lai",
        "email": "laihui@selectdb.com",
        "time": "Tue Jul 21 10:20:36 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jul 21 10:20:36 2026 +0800"
      },
      "message": "[fix](load) support compute_group in FE stream load routing and planning (#65571)\n\n### What problem does this PR solve?\n\nPR #53031 added support for the `compute_group` header on the BE side,\nbut the FE Stream Load redirect path still only recognized the legacy\n`cloud_cluster` header.\n\nAs a result, clients sending Stream Load requests through FE could not\nuse `compute_group` to select the target compute group.\n\nThere was also an ambiguity during planning: if a request was sent\ndirectly to a BE in compute group A while specifying compute group B in\nthe header, propagating the header to the planning request could make\nthe receiving BE and the execution compute group inconsistent.\n\n### What is changed?\n\nThis PR adds `compute_group` support to the FE Stream Load redirect path\nand defines the planning behavior based on the receiving BE.\n\n- Make FE recognize the `compute_group` header when selecting the Stream\nLoad redirect target.\n- Give `compute_group` precedence over the legacy `cloud_cluster`\nheader.\n- Keep `cloud_cluster` as a compatibility fallback.\n- Make ordinary Stream Load pass the receiving BE\u0027s `backend_id` to FE.\n- Resolve the planning compute group from the receiving BE\u0027s\n`backend_id`.\n- Override the planning request\u0027s `cloud_cluster` with the receiving\nBE\u0027s compute group.\n- Keep the existing fallback behavior when `backend_id` is unavailable.\n\nHTTP Stream already passes the receiving BE\u0027s `backend_id`, so no\nadditional BE-side behavior change is required for HTTP Stream.\n\n### Behavior\n\n- Request through FE with `compute_group\u003dB`:\n  - FE redirects the request to a BE in B.\n  - The receiving BE passes its `backend_id` to FE.\n  - The load is planned and executed in B.\n\n- Request sent directly to a BE in A with `compute_group\u003dB`:\n  - The receiving BE passes its own `backend_id` to FE.\n  - The load is planned and executed in A.\n- The conflicting `compute_group` header does not change the planning\ncompute group.\n\n### Tests\n\n- Added FE unit tests for:\n  - `compute_group` header precedence during redirect.\n  - Fallback to the legacy `cloud_cluster` header.\n  - Resolving the planning compute group from `backend_id`.\n\n- Added cloud regression coverage for:\n  - FE Stream Load redirect using `compute_group`.\n  - Direct-to-BE Stream Load with a conflicting `compute_group` header."
    },
    {
      "commit": "e51099a9389690784a96e221f74d69bbfee35e73",
      "tree": "0805c837a0627da788f6e7966621895de6064401",
      "parents": [
        "181e0d640c0e55d401c36ec964e6fea638533b1f"
      ],
      "author": {
        "name": "Jerry Hu",
        "email": "hushenggang@selectdb.com",
        "time": "Tue Jul 21 09:56:03 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jul 21 09:56:03 2026 +0800"
      },
      "message": "[fix](fe) Treat json_extract_no_quotes as json_extract_string alias (#65380)\n\n### What problem does this PR solve?\n\nIssue Number: None\n\nRelated PR: None\n\nProblem Summary: `json_extract_no_quotes` had a separate FE Nereids\nfunction definition, so it did not share the same FE binding and rewrite\nbehavior as `json_extract_string`. This changes FE builtin function\nregistration so `json_extract_no_quotes` resolves through the existing\n`JsonbExtractString` implementation, making it an alias of\n`json_extract_string`. Regression expectations are updated for the alias\nsemantics.\n\n### Release note\n\n`json_extract_no_quotes` is treated as an alias of `json_extract_string`\nin FE.\n\n### Check List (For Author)\n\n- Test:\n- FE UT: `./run-fe-ut.sh --run\norg.apache.doris.nereids.rules.analysis.BindFunctionTest`\n- Build: `doris-local-regression --network 10.26.20.3/24 all -d\ndoc/sql-manual/sql-functions -s doc_json_functions_test -forceGenOut`\n- Regression test: `doris-local-regression --network 10.26.20.3/24 run\n-d doc/sql-manual/sql-functions -s doc_json_functions_test`\n- Regression test: `doris-local-regression --network 10.26.20.3/24 run\n-d query_p0/sql_functions/json_functions -s test_json_function`\n    - Code check: `git diff --check`\n- Behavior changed: Yes. `json_extract_no_quotes` now follows\n`json_extract_string` FE semantics.\n- Does this need documentation: No"
    },
    {
      "commit": "181e0d640c0e55d401c36ec964e6fea638533b1f",
      "tree": "0bb8a59b69a6682899b5523cc93ffef8b041a40d",
      "parents": [
        "8460676f3fcb6304f27dba32379da7d0fe0ad9d9"
      ],
      "author": {
        "name": "zgxme",
        "email": "zhenggaoxiong@selectdb.com",
        "time": "Mon Jul 20 20:03:30 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jul 20 20:03:30 2026 +0800"
      },
      "message": "[test](regression) Add Iceberg v3 row lineage coverage (#63719)\n\n### What problem does this PR solve?\n\nAdd regression coverage for Iceberg v3 row lineage scenarios, including\nDoris-created tables, Spark interoperability, DML mode matrices, schema\nevolution, time travel, deletion vector interoperability, large\nstability, and randomized cross-engine checks.\n\n### Check List (For Author)\n\n- Test \u003c!-- At least one of them must be included. --\u003e\n    - [x] Regression test\n    - [ ] Unit Test\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [ ] No need to test or manual test. Explain why:\n- [ ] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [ ] Other reason \u003c!-- Add your reason?  --\u003e\n\n- Behavior changed:\n    - [x] No.\n    - [ ] Yes. \u003c!-- Explain the behavior change --\u003e\n\n- Does this need documentation?\n    - [x] No.\n- [ ] Yes. \u003c!-- Add document PR link here. eg:\nhttps://github.com/apache/doris-website/pull/1214 --\u003e\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [x] Add branch pick label \u003c!-- Add branch pick label that this PR\nshould merge into --\u003e"
    },
    {
      "commit": "8460676f3fcb6304f27dba32379da7d0fe0ad9d9",
      "tree": "fc9c79d10c762e457c857192d5410c1a1c46f060",
      "parents": [
        "d2a3b48a9c8c3180e07031f28b903134134e3247"
      ],
      "author": {
        "name": "Nelson Boss",
        "email": "bosswnx@qq.com",
        "time": "Mon Jul 20 18:58:55 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jul 20 18:58:55 2026 +0800"
      },
      "message": "[Fix](nereids) Freeze sortedPartitionRanges in SelectedPartitions to prevent TOCTOU NPE during partition pruning (#65659)\n\n### What problem does this PR solve?\n\nIssue Number: #64800\n\nRelated PR: #58877\n\nProblem Summary:\n\nFix a TOCTOU (Time-of-Check Time-of-Use) race condition that causes\n`NullPointerException` during partition pruning on external tables.\n\n**Root cause:** In `PruneFileScanPartition.pruneExternalPartitions()`:\n\n1. `nameToPartitionItem` — frozen at T1 inside\n`LogicalFileScan.SelectedPartitions` when the plan node is constructed\n(via `initSelectedPartitions()`)\n2. `sortedPartitionRanges` — re-read from the `HivePartitionValues`\ncache at T2 when the pruning rule executes (via\n`externalTable.getSortedPartitionRanges()`)\n\nIf the cache is refreshed between T1 and T2 (e.g. concurrent `ALTER\nTABLE ADD/DROP PARTITION`), the two snapshots diverge.\n`binarySearchFiltering` uses the new snapshot to decide which partitions\nmatch the predicate, but the caller looks them up in the old snapshot:\n\n```java\nfor (String name : prunedPartitions) {\n    selectedPartitionItems.put(name, nameToPartitionItem.get(name));\n    // nameToPartitionItem.get(name) returns null for partitions that were added after T1\n}\n// \u003d\u003e ImmutableMap.copyOf() throws NPE: \"null value in entry: dt\u003d2026-06-22\u003dnull\"\n```\n\n**Concrete example:** A Hive table has 3 partitions\n`dt\u003d2026-06-20/21/23`. Session A runs `SELECT * FROM t WHERE\ndt\u003d\u00272026-06-22\u0027`:\n\n```\nT1  BindRelation: LogicalFileScan freezes nameToPartitionItem from cache\n    → {2026-06-20, 2026-06-21, 2026-06-23}   (no 2026-06-22)\n\n    [Session B runs ALTER TABLE ADD PARTITION (dt\u003d\u00272026-06-22\u0027)]\n    [cache is refreshed → now has 4 partitions including 2026-06-22]\n\nT2  PruneFileScanPartition: re-reads sortedPartitionRanges from cache\n    → {2026-06-20, 2026-06-21, 2026-06-22, 2026-06-23}   (new snapshot)\n    binarySearchFiltering matches dt\u003d2026-06-22 → returns \"dt\u003d2026-06-22\"\n    nameToPartitionItem.get(\"dt\u003d2026-06-22\") → null   (old snapshot has no such key)\n    → NPE: \"null value in entry: dt\u003d2026-06-22\u003dnull\"\n```\n\n**Fix:** freeze both views from a single snapshot so T2 never re-reads\nthe cache.\n- `SelectedPartitions` now carries an `Optional\u003cSortedPartitionRanges\u003e`\nfield.\n- `HMSExternalTable.initSelectedPartitions` reads the cached\n`HivePartitionValues` once and freezes both the partition map and the\ncached sorted ranges together (reuses the cache, no just-in-time\nrebuild).\n- Hudi has no cached ranges, so `PruneFileScanPartition` builds them\nlazily from the frozen map only when binary search filtering is enabled.\n- A missing partition in the lookup loop is now an invariant failure\n(`Preconditions.checkState`) instead of being silently skipped, which\npreviously produced a partial scan over fewer partitions.\n\n### Release note\n\nFix `NullPointerException` in partition pruning when external table\npartitions are modified concurrently during query optimization (TOCTOU\nrace in binary search partition filtering)."
    },
    {
      "commit": "d2a3b48a9c8c3180e07031f28b903134134e3247",
      "tree": "4fe2fc58d9852125c6750bc9b9ab0ea67dacacee",
      "parents": [
        "c83166309e4444bddd71cb79553ca12399ba4786"
      ],
      "author": {
        "name": "wudi",
        "email": "wudi@selectdb.com",
        "time": "Mon Jul 20 17:16:57 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jul 20 17:16:57 2026 +0800"
      },
      "message": "[fix](streaming-job) Isolate schemas for CDC snapshot splits (#65645)\n\n### What problem does this PR solve?\nDuring snapshot backfill, a snapshot split carried schemas for every\ncaptured table. A change record from another table could then be\nevaluated with the current split key, causing an invalid-field failure\nwhen the tables use different primary-key names.\n\nThis change limits each MySQL and PostgreSQL snapshot split to its own\ntable schema while retaining the reader-level schemas for the global\nstream split. It also strengthens the multi-table concurrent-DML cases\nby explicitly enabling snapshot backfill, using different primary-key\nnames, and keeping a source writer active across multiple successful\ntasks."
    },
    {
      "commit": "c83166309e4444bddd71cb79553ca12399ba4786",
      "tree": "9ced049f1ba5027aadf548a800f323667f386f2d",
      "parents": [
        "6322a095ba70c3e453215c4627390daa390aba48"
      ],
      "author": {
        "name": "wudi",
        "email": "wudi@selectdb.com",
        "time": "Mon Jul 20 16:19:38 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jul 20 16:19:38 2026 +0800"
      },
      "message": "[fix](fe) Keep CDC end offset consistent with current progress (#65688)\n\n### What problem does this PR solve?\n\nStreaming CDC jobs periodically fetch the source end offset, while a\nsuccessful task commits its current offset independently. A task can\nadvance beyond the previously fetched end offset. The FE previously\nconverted the offset comparison into a boolean, so it could stop\nscheduling but could not distinguish equality from the current offset\nbeing ahead, leaving CurrentOffset greater than EndOffset in job output.\n\nThis change preserves the three-way comparison result and advances\nEndOffset to CurrentOffset when the fetched end is behind. End-offset\npublication is synchronized so an obsolete comparison result cannot\noverwrite a newer fetched end. The PostgreSQL latest-offset credential\nregression case now waits for the first task to commit the resolved\nlatest offset before inserting new rows."
    },
    {
      "commit": "6322a095ba70c3e453215c4627390daa390aba48",
      "tree": "37c29f7c4148271df829fb528e0a9c1f5b64fe12",
      "parents": [
        "52e154530ff72b53d205f77d845a3072e44ea5f1"
      ],
      "author": {
        "name": "Gabriel",
        "email": "liwenqiang@selectdb.com",
        "time": "Mon Jul 20 16:15:48 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jul 20 16:15:48 2026 +0800"
      },
      "message": "[fix](be) Materialize const columns before block merge (#65770)\n\n### What problem does this PR solve?\n\nIssue Number: None\n\nRelated PR: None\n\nProblem Summary:\n\n`ScopedMutableBlock` can retain a `ColumnConst` destination, while block\nmerge materializes the incoming column before appending it. Appending\nthat full column to the const destination fails with an internal error.\n\nThis change materializes a const destination once before appending,\nkeeping the destination and source representations compatible for both\nregular and ignore-overflow merges.\n\n### Release note\n\nFix block merge failures when scanner padding produces constant nullable\ncolumns.\n\n### Check List (For Author)\n\n- Test\n    - [ ] Regression test\n    - [x] Unit Test\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [ ] No need to test or manual test. Explain why:\n- [ ] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [ ] Other reason\n\n- Behavior changed:\n    - [x] No.\n    - [ ] Yes.\n\n- Does this need documentation?\n    - [x] No.\n    - [ ] Yes.\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label"
    },
    {
      "commit": "52e154530ff72b53d205f77d845a3072e44ea5f1",
      "tree": "858567d649576f25f3969680da78c91ddbf14c7d",
      "parents": [
        "8b081ea92ac4894b7487632b9576ffe73bce72c0"
      ],
      "author": {
        "name": "Socrates",
        "email": "suyiteng@selectdb.com",
        "time": "Mon Jul 20 14:32:23 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jul 20 14:32:23 2026 +0800"
      },
      "message": "[improvement](regression) Reduce Kerberos test environment memory (#65564)\n\nProblem Summary:\n\nThe external regression pipeline currently runs two heavyweight Kerberos\ncontainers. Each container starts HiveServer2, YARN\nResourceManager/NodeManager, MariaDB, and other processes that the Doris\nKerberos cases do not use, resulting in roughly 4.3 GB of combined\nmemory usage.\n\nThis PR replaces those images with a minimal reusable environment based\non Apache Hive 3.1.3. Each realm now runs only:\n\n- a native MIT Kerberos KDC\n- one HDFS NameNode and DataNode\n- one Kerberos-enabled Hive Metastore backed by embedded Derby\n\nThe existing realm names, ports, service principals, client principals,\nand keytab filenames remain unchanged. JVM heaps are explicitly bounded,\nand each container has a 1.5 GB memory limit. HDFS listeners bind\nindependently from their advertised Kerberos hostnames, and startup now\nwaits until each DataNode is registered with its NameNode.\n\nThe container no longer owns Hive test fixtures.\n`test_single_hive_kerberos` and `test_two_hive_kerberos` now create\ntheir test database, Hive table, and four rows through Doris.\n`test_non_catalog_kerberos` also creates its database so it no longer\ndepends on suite execution order. Existing expected output files remain\nunchanged."
    },
    {
      "commit": "8b081ea92ac4894b7487632b9576ffe73bce72c0",
      "tree": "38fe90b0c90605a0991fa830586f6ccd640f35d7",
      "parents": [
        "2c697190ccd5861af8680d07debf94a3eea509ae"
      ],
      "author": {
        "name": "Steven Pall",
        "email": "mail@stevenpall.ca",
        "time": "Sun Jul 19 23:30:22 2026 -0700"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jul 20 14:30:22 2026 +0800"
      },
      "message": "[fix](cloud) set recycler S3 client requestTimeoutMs to avoid curl-28 on slow DeleteObjects (mirror #49315) (#64758)\n\n## Proposed changes\n\nThe cloud recycler builds its S3 client in `S3Accessor::init()`\n(`cloud/src/recycler/s3_accessor.cpp`) from an\n`Aws::Client::ClientConfiguration` that never sets `requestTimeoutMs`,\nso\nit keeps the SDK default of 3000ms. The vendored aws-sdk-cpp maps that\nto\n`CURLOPT_LOW_SPEED_TIME\u003d3` / `CURLOPT_LOW_SPEED_LIMIT\u003d1`, so any slow or\nlarge `DeleteObjects` request that can\u0027t sustain \u003e1 byte/s for 3 seconds\nis aborted with curl error 28 (\"Timeout was reached\").\n\nThis is the same defect that #49315 fixed for the BE\n(`be/src/util/s3_util.cpp`, `requestTimeoutMs \u003d 30000`), but the cloud\nrecycler\u0027s client was missed by that change. On object stores with\nhigher per-request latency (e.g. an OVH cold object-storage vault) the\nrecycler wedges: every `delete_rowset_data` / `DeleteObjects` aborts at\n3s with curlCode 28, the recycler burns its delete budget on timed-out\nops, and the orphan backlog never drains.\n\nThis change sets `requestTimeoutMs \u003d 30000` and\n`connectTimeoutMs \u003d 5000` on the recycler client, mirroring #49315.\n\nSymptom in MS recycler log before the fix:\n```\ns3_obj_client.cpp: failed to delete objects ... responseCode\u003d-1 error\u003d\"curlCode: 28, Timeout was reached\"\nrecycler.cpp: failed to delete rowset data, instance_id\u003d...\n```\n\n## Further comments\n\nPure timeout-config change in the recycler S3 client path; no behavior\nchange for fast object stores. `MaxDeleteBatch` is left unchanged.\n\nSigned-off-by: Steven Pall \u003cmail@stevenpall.ca\u003e"
    },
    {
      "commit": "2c697190ccd5861af8680d07debf94a3eea509ae",
      "tree": "4fbb9f9c1ae527dc9d3a5e551f405d4495583871",
      "parents": [
        "fe24eb46847602be5deed43070f358eddca632a2"
      ],
      "author": {
        "name": "zhangstar333",
        "email": "zhangsida@selectdb.com",
        "time": "Mon Jul 20 14:21:35 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jul 20 14:21:35 2026 +0800"
      },
      "message": "[fix](be) Correct Arrow timestamps before Unix epoch in DATETIMEV2 deserialization (#65689)\n\n### What problem does this PR solve?\nProblem Summary:\n\nbrefore: auto utc_epoch \u003d static_cast\u003c**UInt64**\u003e(date_value);\n\nBut Arrow timestamps are signed integers. For example, 1969-12-31\n23:59:59.123456 UTC is represented as -876544 in timestamp[us]. After\nconversion to UInt64, it becomes a very large positive value, causing an\ninvalid DATETIMEV2 result.\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test \u003c!-- At least one of them must be included. --\u003e\n    - [ ] Regression test\n    - [x] Unit Test\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [ ] No need to test or manual test. Explain why:\n- [ ] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [ ] Other reason \u003c!-- Add your reason?  --\u003e\n\n- Behavior changed:\n    - [x] No.\n    - [ ] Yes. \u003c!-- Explain the behavior change --\u003e\n\n- Does this need documentation?\n    - [x] No.\n- [ ] Yes. \u003c!-- Add document PR link here. eg:\nhttps://github.com/apache/doris-website/pull/1214 --\u003e\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label \u003c!-- Add branch pick label that this PR\nshould merge into --\u003e"
    },
    {
      "commit": "fe24eb46847602be5deed43070f358eddca632a2",
      "tree": "5f9a9add509830a4fe1d1717f341dbadc460ddaa",
      "parents": [
        "dfa3b018a4e3ee6cc99f7907d1d2ecc376d919c4"
      ],
      "author": {
        "name": "Gavin Chou",
        "email": "gavin@selectdb.com",
        "time": "Mon Jul 20 11:54:03 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jul 20 11:54:03 2026 +0800"
      },
      "message": "[fix](brpc) Make secondary package aliases non-owning (#65777)\n\n## Proposed changes\n\nFix DORIS-27128 in the brpc secondary package alias implementation.\n\nThe existing brpc secondary_package_name patch registers the secondary\nservice name by copying the primary ServiceProperty, and registers\nsecondary methods by copying the primary MethodProperty as owning\nentries. During brpc::Server destruction, ClearServices() iterates all\nservice and method map entries:\n\n- copied ServiceProperty with SERVER_OWNS_SERVICE can delete the same\nservice twice\n- copied MethodProperty with own_method_status\u003dtrue can delete the same\nMethodStatus twice\n\nThis matches the cloud MetaService secondary package shutdown/coredump\npath. The patch now makes secondary aliases non-owning:\n\n- secondary MethodProperty sets own_method_status\u003dfalse and shares the\nprimary MethodStatus\n- secondary ServiceProperty sets ownership\u003dSERVER_DOESNT_OWN_SERVICE and\nshares the primary service pointer only for lookup\n\nThe previous variant regression retry change and the TLS starter change\nhave both been removed from this PR.\n\nThe failed thirdparty CI was unrelated to the brpc patch application:\nbrpc had already built and installed successfully, then Linux/macOS\nfailed while building lance-c because runner cargo selected a newer Rust\ntoolchain and ethnum v1.5.2 failed with E0512. The PR now pins lance-c\nto Rust/Cargo 1.91.0 through rustup when available, and fails early with\na clear message otherwise.\n\n## Testing\n\n- bash -n thirdparty/build-thirdparty.sh\n- git diff --check\n- Checked failed thirdparty logs for Linux/macOS: both failed at ethnum\nv1.5.2 in lance-c after brpc install completed.\n- Verified the brpc patch with a temporary local apply check:\nreverse-applied the old patch from origin/master to the local brpc-1.4.0\nsource copy, then applied the updated patch successfully.\n\nIssue: DORIS-27128\n\nCo-authored-by: gavinchou \u003cgavinchou@apache.org\u003e"
    },
    {
      "commit": "dfa3b018a4e3ee6cc99f7907d1d2ecc376d919c4",
      "tree": "e4b507de2d582e6461c3bcf2371e46bcf5196e76",
      "parents": [
        "91a12e13170268812c0002e653b408eca002ca3f"
      ],
      "author": {
        "name": "daidai",
        "email": "changyuwei@selectdb.com",
        "time": "Mon Jul 20 11:46:25 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jul 20 11:46:25 2026 +0800"
      },
      "message": "[fix](iceberg) make iceberg deletion vector stable. (#65676)\n\n### What problem does this PR solve?\n Related PR: #59272\n\nDoris already supports Iceberg v3 deletion vectors, but several\nstability gaps remain:\n- Invalid Iceberg/Paimon offsets and lengths could reach cache lookup or\nmemory allocation.\n- Iceberg Puffin deletion-vector CRC32 was not verified.\n- Validation was inconsistent across legacy readers, format-v2 readers,\nnormal scans, and the `position_deletes` path.\n- The Iceberg reader enforced a 1 GiB limit, while the writer did not.\n\n### What is changed?\n\n- Validate Iceberg/Paimon deletion-vector descriptors before cache\nlookup and allocation.\n- Verify Iceberg Puffin CRC32 before bitmap decoding.\n- Report malformed Paimon framing as data-quality errors.\n- Validate Puffin metadata in normal FE scans and the `position_deletes`\npath.\n- Align the Iceberg writer/reader 1 GiB limit and separate\nIceberg/Paimon limits.\n- Add BE, FE, and format-v2 boundary tests.\n\n### Release note\n\nHarden Iceberg and Paimon deletion-vector validation and consistently\nenforce the 1 GiB Iceberg deletion-vector limit on write and read paths.\n\n### Check List (For Author)\n\n- Test \u003c!-- At least one of them must be included. --\u003e\n    - [ ] Regression test\n    - [x] Unit Test\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [ ] No need to test or manual test. Explain why:\n- [ ] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [ ] Other reason \u003c!-- Add your reason?  --\u003e\n\n- Behavior changed:\n    - [ ] No.\n    - [x] Yes. \u003c!-- Explain the behavior change --\u003e\n\n- Does this need documentation?\n    - [x] No.\n- [ ] Yes. \u003c!-- Add document PR link here. eg:\nhttps://github.com/apache/doris-website/pull/1214 --\u003e\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label \u003c!-- Add branch pick label that this PR\nshould merge into --\u003e"
    },
    {
      "commit": "91a12e13170268812c0002e653b408eca002ca3f",
      "tree": "0232f3fc8754920659f2eba1b517562b9bf440a4",
      "parents": [
        "f39116498dc8858a5311d54566d85eaacba318f6"
      ],
      "author": {
        "name": "Mingyu Chen (Rayner)",
        "email": "yunyou@selectdb.com",
        "time": "Mon Jul 20 10:39:26 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jul 20 10:39:26 2026 +0800"
      },
      "message": "[fix](be) Fix macOS BE compilation errors (#65783)\n\n### What problem does this PR solve?\n\nIssue Number: None\n\nRelated PR: None\n\nProblem Summary: Clang 20 fails the macOS BE build because the\nLinux-only PHDR cache state is unused on macOS and Parquet boolean\ncolumn-index values are represented by std::vector\u003cbool\u003e proxy\nreferences. Restrict the PHDR state field to supported Linux builds and\nmaterialize Parquet page bounds as their physical C++ value types before\ndecoding.\n\nCo-authored-by: morningman \u003cmorningman@morningmandeMacBook-Pro.local\u003e"
    },
    {
      "commit": "f39116498dc8858a5311d54566d85eaacba318f6",
      "tree": "fd05c6b1efb191971bef1e85a2b91697a180c000",
      "parents": [
        "54e0e9712b6057d5f1c643103d1612d196b3cd35"
      ],
      "author": {
        "name": "Gabriel",
        "email": "liwenqiang@selectdb.com",
        "time": "Mon Jul 20 10:06:49 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jul 20 10:06:49 2026 +0800"
      },
      "message": "[fix](scanner)(nereids) Harden FileScannerV2 and fix external COUNT pushdown semantics (#65548)\n\n### What problem does this PR solve?\n\nIssue Number: None\n\nRelated PR: None\n\nThis PR fixes two correctness areas in external file scans.\n\n#### 1. Preserve external `COUNT(*)` / `COUNT(column)` semantics\n\nExternal `COUNT(*)` retained an arbitrary scan slot after column\npruning. BE then inferred aggregate semantics from that post-pruning\nscan shape and could treat the retained nullable placeholder as the\nargument of `COUNT(column)`. This produced wrong results, including 9015\ninstead of 10000 rows for special ORC data and 116 instead of 219 rows\nfor Hive basic types.\n\nThe fix transports semantic COUNT arguments from Nereids through\n`TPlanNode.push_down_count_slot_ids` to both file-scanner\nimplementations and their table/hybrid readers. The thrift field has an\nexplicit compatibility contract:\n\n- absent: an old FE plan with unknown COUNT semantics; use the\nconservative normal-scan path;\n- present and empty: explicit `COUNT(*)` / row-count semantics;\n- present and non-empty: explicit `COUNT(column)` arguments, which may\nuse metadata only when every mapping is proven safe.\n\nNereids no longer applies storage COUNT pushdown to cast arguments whose\nnull/error behavior would be lost. FE split planning also restricts\nIceberg/Paimon metadata-count split reduction and Hive/TVF no-split\nshortcuts to explicit row-count semantics, so a `COUNT(column)` fallback\nretains all data splits and normal scan parallelism. BE V1, V2, Hudi,\nPaimon, and table-reader paths preserve the same argument state, reject\nunsafe metadata shortcuts, and keep adaptive batching enabled for\nreal-row fallbacks.\n\n#### 2. Harden FileScannerV2 schema and reader edge cases\n\nFileScannerV2 treated valid empty splits as failures, could count\nstopped EOF or malformed Native input as an empty file, localized\nevolved nested predicates without fully preserving type/nullability\ncontracts, and rejected whole Parquet schemas when only unprojected\nleaves used unsupported logical types. Parquet page-cache range metadata\nalso lost warm non-exact reuse while lacking a bound.\n\nThe fix distinguishes stopped reads, valid empty input, and\nmalformed/truncated input; keeps unsafe nested predicates above\n`TableReader`; validates only semantically required Parquet projections\nwhile preserving strict checks for real predicate and `COUNT(column)`\ninputs; uses safe row-position/default carriers for placeholder-only\npaths; and shares bounded per-file page-range indexes without adding a\nprocess-wide lock to the `ReadAt` hot path.\n\n### Release note\n\nFix wrong external `COUNT(*)` / `COUNT(column)` results and unsafe\nmetadata pushdown across Nereids, thrift, FE split planning, and BE file\nreaders. Also harden FileScannerV2 handling of empty or interrupted\ninput, evolved nested predicates, unsupported unprojected Parquet\nlogical types, COUNT placeholders, hybrid readers, and bounded\ncross-reader page-cache range reuse.\n\n### Check List (For Author)\n\n- Test: Regression test / Unit Test\n- BE ASAN unit coverage for V1/V2 COUNT semantics, table-level metadata\ngates, schema-evolution fallbacks, COUNT placeholders, adaptive\nbatching, and Hudi/Paimon/Iceberg hybrid paths\n- FE unit coverage for Nereids COUNT argument capture and\nHive/Iceberg/Paimon/TVF split-planning behavior\n    - `ColumnMapper*.*` BE ASAN suite (111 tests passed)\n- Existing external regression cases `test_special_orc_formats` and\n`test_hive_basic_type` reproduce the fixed COUNT(*) wrong results\n- `test_file_scanner_v2_review_fixes` regression suite covers empty\ninput, evolved nested types, unsupported Parquet placeholders, and COUNT\nfallback behavior\n- Behavior changed: Yes (COUNT semantics are transported explicitly;\nunsafe or ambiguous metadata COUNT plans fall back to normal scanning;\nvalid empty splits are skipped; stopped or malformed input is not\ncounted as empty; unsupported unprojected Parquet leaves no longer fail\nunrelated scans)\n- Does this need documentation: No"
    },
    {
      "commit": "54e0e9712b6057d5f1c643103d1612d196b3cd35",
      "tree": "713b25f320b74e58249c7da210feb854b73a0037",
      "parents": [
        "a94d7b86eeba672fcd455e5a6afaa5515bcc1d61"
      ],
      "author": {
        "name": "zhangstar333",
        "email": "zhangsida@selectdb.com",
        "time": "Sat Jul 18 19:34:11 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sat Jul 18 19:34:11 2026 +0800"
      },
      "message": "[bug](iceberg) use split file format for iceberg scan (#65760)\n\n### What problem does this PR solve?\nProblem Summary:\nA table maybe contain mixed parquet and orc files.\nbefore used the table-level format for every split files, it\u0027s should\nuse from data file.\nand when table properties were absent, maybe trigger full planFiles()\ncall for every splits, this cause mush times."
    },
    {
      "commit": "a94d7b86eeba672fcd455e5a6afaa5515bcc1d61",
      "tree": "6644b9e5e201514939a28a43d6d2270eee3b185c",
      "parents": [
        "1127003fc880ee3d2bc0e2191942a58813790185"
      ],
      "author": {
        "name": "Benedict Jin",
        "email": "asdf2014@apache.org",
        "time": "Sat Jul 18 15:10:33 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sat Jul 18 15:10:33 2026 +0800"
      },
      "message": "[feature](query-cache) Support incremental merge for stale query cache entries (#65482)\n\nProblem Summary:\n\nFor hourly batch-load workloads, every load bumps the tablet version of\nthe\nhot partition, so its query cache entries never hit again: each \"last N\ndays\"\naggregation recomputes the whole hot partition every hour, and BE CPU\nspikes\nduring load windows.\n\nThis PR adds **incremental merge** for the query cache (experimental,\ndefault\noff). When a cache entry\u0027s version falls behind, BE reuses it instead of\ndiscarding it: it scans only the delta rowsets in\n`(cached_version, current_version]`, emits their partial aggregation\nstate\nside by side with the cached partial blocks so the existing upstream\nmerge\naggregation combines both, and writes the merged entry back under the\nnew\nversion. The execution plan is unchanged.\n\n![Full recompute vs incremental\nmerge](https://raw.githubusercontent.com/asdf2014/doris-website/query-cache-incremental-merge-docs/static/images/next/query-cache/incremental-merge-architecture.svg)\n\n**Measured impact** (80M-row duplicate-key table, 200k-row hourly\nappends,\ngroup by a non-distribution column): a stale query drops from **~307\nms**\n(full recompute) to **~19 ms** with incremental merge, about **16x** and\non\npar with an exact hit (~17 ms); the cost no longer grows with the base\ndata\nsize (8M rows: 48 ms full vs 19 ms incremental; 80M rows: 307 ms vs 19\nms).\nFull numbers in the Verification section below.\n\nDesign highlights:\n\n1. Correctness algebra: `S(v2) \u003d S(v1) ⊎ Δ` for append-only data plus\nthe\n   homomorphism `partial(A ⊎ B) ≡ merge(partial(A), partial(B))`. Every\nprecondition guards one of the two properties; any violation falls back\n   to a full recompute silently, so results are always correct.\n2. FE authorizes via a new optional thrift field\n(`TQueryCacheParam.allow_incremental`) only for non-finalize\naggregations\n   directly above the olap scan on an append-only index: DUP_KEYS, or\nmerge-on-write UNIQUE_KEYS. Merge-on-read UNIQUE and AGG tables always\n   fall back.\n3. BE re-validates per tablet at decision time (fallback reasons are\nvisible\n   in the profile): cloud mode, version order, the compaction threshold\n(`query_cache_max_incremental_merge_count`, default 8, 0 disables), keys\ntype, delta capturability on the version graph, delete predicates in the\ndelta, and for merge-on-write tables a delete-bitmap window check that\nrejects loads rewriting pre-existing keys. A rare backfill therefore\ncosts\n   exactly one full recompute, which re-bases the entry, and the next\n   pure-append load is incremental again.\n4. A fragment-level `QueryCacheRuntime` makes one idempotent decision\nper\ninstance (HIT / INCREMENTAL / MISS), pins the entry and pre-captures the\n   delta read source. This also fixes three pre-existing defects: the\nscan/cache-source double-lookup race (could write back an empty poisoned\n   entry), row-binlog scans polluting the cache, and `build_cache_key`\nfailures aborting the query instead of degrading to uncached execution.\n5. Entropy control: every merge appends the delta blocks to the entry,\nso\n   after `query_cache_max_incremental_merge_count` merges the next query\n   recomputes in full and compacts the entry; oversized entries keep the\n   delta-scan benefit but skip the write back.\n\nVerification:\n\n- FE UT: `QueryCacheNormalizerTest` 13/13 (8 assertions on the\nincremental\nauthorization matrix: switch, plan shapes, DUP / MoW / MoR / AGG /\nnested\n  aggregation).\n- BE UT: 34/34 across the decision layer, the incremental scenarios (MoW\npure-append / history-rewrite / irrelevant bitmap entries, version gap,\ncapture error via debug point, delete predicates) and the operator\nlayer.\n  llvm-cov: zero uncovered changed lines; `query_cache.cpp` at 100% line\n  coverage.\n- Regression: `query_cache_incremental` passes on a real 1FE+1BE\ncluster;\nevery step is checked against a cache-off baseline. BE metrics confirm\nthe\nincremental path actually fires (`stale_hit_total` +40 across the suite,\n  fallbacks exactly at the three designed spots).\n- Benchmark (80M-row DUP table, 200k-row hourly appends, group by a\n  non-distribution column, 5 rounds each):\n\n  | scenario | latency (median) |\n  |---|---|\n  | no cache, full scan | ~446 ms |\n  | exact hit | ~17 ms |\n  | stale + incremental merge (this PR) | ~19 ms |\n  | stale + full recompute (switch off) | ~307 ms |\n\n  A stale query becomes as cheap as an exact hit, and its cost no longer\n  grows with the base data size (8M rows: 48ms full vs 19ms incremental;\n  80M rows: 307ms vs 19ms).\n\n### Release note\n\nAdd experimental incremental merge for the query cache: a stale entry\ncan be\nreused by scanning only the delta rowsets and merging them with the\ncached\npartial aggregation state. Controlled by the session variable\n`enable_query_cache_incremental` (default off) and the BE config\n`query_cache_max_incremental_merge_count` (default 8).\n\n### Check List (For Author)\n\n- Test\n    - [x] Regression test\n    - [x] Unit Test\n\n- Behavior changed:\n    - [x] No.\n\n- Does this need documentation?\n- [x] Yes. Document PR:\nhttps://github.com/apache/doris-website/pull/3975"
    },
    {
      "commit": "1127003fc880ee3d2bc0e2191942a58813790185",
      "tree": "220cdef1a406e96b290033a9cc05d717e12ae7a0",
      "parents": [
        "513cd20af5d0e52fdd77b8afeeebd1bcfb4a6058"
      ],
      "author": {
        "name": "Mingyu Chen (Rayner)",
        "email": "yunyou@selectdb.com",
        "time": "Sat Jul 18 12:03:05 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sat Jul 18 12:03:05 2026 +0800"
      },
      "message": "[fix](arrow-flight) Keep coordinator alive across GetFlightInfo/DoGet for external table scan (#64799)\n\n### What problem does this PR solve?\n\nIssue Number: close #62259\n\nRelated PR: #64797\n\nProblem Summary:\n\nArrow Flight SQL queries against Iceberg (and other external) tables in\nbatch split mode crashed the BE / failed with `Split source X is\nreleased`.\n\nArrow Flight executes a query in two phases: `GetFlightInfo` (plan +\nsubmit to BE) and `DoGet` (the client pulls results from the BE). For an\nexternal table scan in batch split mode, the BE keeps scanning during\n`DoGet` and lazily fetches file splits from the FE via the\n`fetchSplitBatch` RPC, using an async `SplitSource` that the FE\ncoordinator holds (through its scan nodes).\n\nThe FE closed the coordinator at the end of `GetFlightInfo`\n(`StmtExecutor.executeAndSendResult`\u0027s `finally` → `Coordinator.close()`\n→ `ScanNode.stop()` → `SplitSourceManager.removeSplitSource()`) and also\nunregistered it (`FlightSqlConnectProcessor.close()` →\n`StmtExecutor.finalizeQuery()`). So by the time the BE called\n`fetchSplitBatch` during `DoGet`, the `SplitSource` was already gone.\nThe MySQL protocol is unaffected because plan + execute share one\nrequest, so the coordinator stays alive until all results are consumed.\n\nThis PR keeps the coordinator (and its `SplitSource`) alive across the\ntwo phases and cleans it up reliably:\n\n- **StmtExecutor**: for an Arrow Flight query that produces results on\nthe BE (`coordBase \u003d\u003d coord`), mark it deferred, register the executor\non the `ConnectContext`, and skip the eager `Coordinator.close()` in the\n`finally`. A failed query (whose `exec()` threw) is not deferred and is\nclosed as before.\n- **ConnectContext**: hold the deferred executors and add\n`closeFlightSqlDeferredExecutors()`, which closes their coordinators\n(releasing the `SplitSource` and the query queue slot) and unregisters\nthe queries.\n- **FlightSqlConnectProcessor.close()**: do not finalize deferred\nexecutors.\n- **DorisFlightSqlProducer**: finalize the previous query\u0027s deferred\ncoordinator when the next query starts on the connection.\n- **FlightSqlConnectPoolMgr.unregisterConnection()**: finalize deferred\ncoordinators when the connection is torn down. All teardown paths\n(idle/query timeout, bearer token expiry, explicit `CloseSession`) reach\nhere, so an abandoned connection cannot leak the coordinator.\n\nNon-Arrow-Flight paths (MySQL, internal tables, point queries) are\nunchanged: `deferredForArrowFlight` can only become true for\n`ARROW_FLIGHT_SQL`.\n\nThe BE-side error-path hardening (so any `fetchSplitBatch` failure fails\ngracefully instead of crashing the BE) is handled separately in #64797."
    },
    {
      "commit": "513cd20af5d0e52fdd77b8afeeebd1bcfb4a6058",
      "tree": "ef5e4823da32c203f3cfb4caaabbdabdca75e9ef",
      "parents": [
        "8a3d7414198f71c83fbb837a0bf08b81390e6ee3"
      ],
      "author": {
        "name": "nsivarajan",
        "email": "117266407+nsivarajan@users.noreply.github.com",
        "time": "Sat Jul 18 09:30:26 2026 +0530"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sat Jul 18 12:00:26 2026 +0800"
      },
      "message": "[Improve](Audit) Auditlog on FE with https enabled (#64697)\n\n### What problem does this PR solve?\n\nIssue Number: close #xxx\n\nRelated PR: #xxx\n\nProblem Summary:\n\n **Problem**\n\nWhen enable_https\u003dtrue and http_port\u003d0 (HTTP disabled), the built-in\naudit plugin fails silently on every batch. AuditStreamLoader hardcoded\nhttp://127.0.0.1:{http_port} at construction time, so the stream load\nURL became http://127.0.0.1:0 — an unreachable address. Every\nloadBatch() call threw a connection error, which was swallowed and\ncounted as discarded logs. No crash, no clear error — audit logging\nsimply stopped\n  working.\n\n  **Fix**\n\n- Build the audit loader\u0027s stream-load URL with the correct scheme\n(https when enable_https\u003dtrue) and port (HttpURLUtil.getHttpPort(),\nwhich returns https_port when HTTPS is enabled).\n- Introduce InternalHttpsUtils as a shared SSL context utility for\ninternal FE HTTP(S) clients. It trusts this FE\u0027s own HTTPS keystore\n(Config.key_store_path/key_store_type/key_store_password) — not\nmysql_ssl_default_ca_certificate, which is a separate,\nindependently-configured trust store scoped to MySQL wire-protocol SSL\nwith no guaranteed relationship to the FE HTTPS cert.\n- Force the FE\u0027s 307 redirect to the Backend to always use http,\nregardless of the scheme the client used to reach the FE\n(RestBaseController.buildRedirectUrlToBackend/redirectToBackend, used by\nall BE-facing\nredirects in LoadAction). BE\u0027s stream-load listener never terminates\nTLS, so this redirect must not inherit https from the inbound request —\nthis was the actual reason the fix wasn\u0027t sufficient without it.\n\n**Additional fixes picked up while rebasing onto the now-merged #60921\n(FE-to-FE HTTPS):**\n- #60921\u0027s own copy of InternalHttpsUtils had the same\nmysql_ssl_default_ca_certificate trust-anchor issue described above —\ncorrected to the same FE-keystore-based model.\n- HttpUtils.executeRequest() was selecting the HTTPS client based on the\nglobal enable_https flag rather than the request\u0027s actual URI scheme,\nwhich could build (and fail on) the HTTPS client for plain http://\n  Backend manager calls. Now gated on request.getURI().getScheme().\n- Removed a duplicate getHttpPort() definition in HttpURLUtil.java (a\ncompile blocker introduced by #60921).\n\n  **Background**\n\nThe audit log plugin runs inside the FE process and works like a\nstream-load client: to persist its own audit rows, it submits a\nstream-load request to\n127.0.0.1:{port}/api/__internal_schema/audit_log/_stream_load — the FE\u0027s\nown REST/Jetty listener, the same one that serves stream-load\nsubmissions, REST APIs, and the web UI. Config.http_port (8030) or\nConfig.https_port (8050) is simply which port it dials to reach that\nlistener, selected by enable_https.\n\nThe FE itself never ingests the data. It authenticates the request,\nselects a Backend, and responds with a 307 Temporary Redirect pointing\nat that Backend\u0027s own HTTP port (webserver_port, e.g. 8040 — a separate\nport on a separate process). The audit loader follows the redirect and\nsends the actual batch there. BE\u0027s port is unrelated to the FE\u0027s\nhttp_port/https_port, and BE never serves HTTPS regardless of the FE\u0027s\n  configuration.\n\nThis two-step path is why fixing the initial URL alone isn\u0027t sufficient\n— the FE\u0027s 307 redirect must still resolve to plain http:// for the\nBackend even though the request that produced it arrived over https,\notherwise it points at a Backend address that will never complete a TLS\nhandshake.\n\n  **Behaviour**\n\n  | Config | Before | After |\n  |---|---|---|\n  | `enable_https\u003dfalse` | `http://127.0.0.1:8030` | unchanged |\n| `enable_https\u003dtrue`, `http_port\u003d0` | `http://127.0.0.1:0` (fails) |\n`https://127.0.0.1:8050` → FE; `http://be:port` → BE |\n  \n**Deployment requirement (please read before enabling this in a multi-FE\ncluster)**\n\nInternalHttpsUtils trusts the exact certificate found at\nConfig.key_store_path — it does not validate against a CA. This works\ncorrectly for:\n  - the audit loader (always calls 127.0.0.1, i.e. itself), and\n- genuine FE-to-FE calls, only if every FE node in the cluster is\ndeployed with the identical key_store_path keystore file.\n\nIf each FE node instead has its own distinct certificate — even if all\nare signed by the same CA — FE-to-FE HTTPS calls will fail certificate\nvalidation, because we trust the specific leaf certificate (via\nKeyStore.getCertificate()), not the CA that issued it. Making that\nscenario work would require separately importing the CA certificate into\neach node\u0027s keystore as its own trusted-certificate entry, or a\ndedicated CA-based trust store — out of scope here. There is currently\nno config-level enforcement or documentation of this requirement\nanywhere in the codebase; this PR doesn\u0027t add any, so please treat this\nnote\n  as the interim source of truth until a proper deployment guide exists.\n\n  **Notes**\n\nThis PR also serves as a correctness fix for #60921 (already merged)\nrather than purely preparatory work for it, since the trust-anchor and\nscheme-selection issues above live in code that PR introduced."
    },
    {
      "commit": "8a3d7414198f71c83fbb837a0bf08b81390e6ee3",
      "tree": "189d8b66329279b5ff7f09244c9614dc2ee44f3e",
      "parents": [
        "bc071ff7c93f60f4c7f92317f43c32b779496bd7"
      ],
      "author": {
        "name": "Gabriel",
        "email": "liwenqiang@selectdb.com",
        "time": "Fri Jul 17 23:06:32 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jul 17 23:06:32 2026 +0800"
      },
      "message": "[fix](fe) Support ClickHouse JDBC V2 metadata (#65709)\n\nClickHouse JDBC V2 exposes databases as schemas and\nreports distributed tables as `REMOTE TABLE`. Doris inferred the\ndatabase metadata mode from the legacy `databaseterm` URL parameter and\nfiltered the vendor table type, so V2 catalogs could not discover\ndatabases or distributed tables. This change uses JDBC metadata\ncapabilities to select catalog/schema mode and includes remote tables in\ndiscovery in both JDBC client implementations."
    },
    {
      "commit": "bc071ff7c93f60f4c7f92317f43c32b779496bd7",
      "tree": "25efab67f3d84da03cf0d25788edaaa7177d147a",
      "parents": [
        "163a4a577e7fe4ffe8b7175582abefaf8bb855ea"
      ],
      "author": {
        "name": "Gabriel",
        "email": "liwenqiang@selectdb.com",
        "time": "Fri Jul 17 23:04:50 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jul 17 23:04:50 2026 +0800"
      },
      "message": "[fix](regression) Wait past unsafe time boundaries (#65742)\n\nThe regression framework truncated the remaining time\nbefore an hour or day boundary to whole seconds. Sleeping for that\ntruncated value could resume in the final fractional second before the\nboundary, allowing a boundary-sensitive case to cross it and fail\nintermittently. This change computes the wait in milliseconds and sleeps\none additional millisecond so execution resumes strictly after the\nboundary. It also adds focused coverage for fractional, exact, and safe\nboundary windows and aligns the JUnit API with the engine already\nprovided by Groovy."
    },
    {
      "commit": "163a4a577e7fe4ffe8b7175582abefaf8bb855ea",
      "tree": "85035d5a042d3780612357f17a90bfd972c2fbe8",
      "parents": [
        "f21a72e623508bac15b485d7ed20420dd98ab05a"
      ],
      "author": {
        "name": "morrySnow",
        "email": "zhangwenxin@selectdb.com",
        "time": "Fri Jul 17 19:14:51 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jul 17 19:14:51 2026 +0800"
      },
      "message": "[fix](constant folding) Align FE constant folding results with execution (#65319)\n\n### What problem does this PR solve?\n\nProblem Summary: \nConstant folding could diverge from BE execution for date and time\ncasts, decimal string formatting, str_to_date formats expressed as\nconstant expressions, and field arguments containing NaN. Align literal\nhandling with BE behavior and add focused unit and regression coverage.\n\n### Release note\n\nFixes incorrect constant-folded results for affected expressions."
    },
    {
      "commit": "f21a72e623508bac15b485d7ed20420dd98ab05a",
      "tree": "d68351f8bf47a11b4a1d34fed4b8f3c0e92bdb93",
      "parents": [
        "6cb634e351cbb5b8251e55a8f18522673de7131b"
      ],
      "author": {
        "name": "Gavin Chou",
        "email": "gavin@selectdb.com",
        "time": "Fri Jul 17 19:11:57 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jul 17 19:11:57 2026 +0800"
      },
      "message": "[fix](cloud) Ignore libfdb ASAN failures in cloud UT runner (#65749)\n\n## Proposed changes\n\nCloud UTs can fail at process exit with an existing ASAN crash in\n`libfdb_c.so`, while the gtest cases themselves have already run. The\nrunner should not skip affected tests.\n\nThis changes `cloud/script/run_all_tests.sh` to:\n\n- still execute each `_test` binary normally\n- tee stdout/stderr to a temporary log\n- preserve the test process exit code via `PIPESTATUS[0]`\n- treat a non-zero result as success only when that test output contains\nboth ASAN text and `libfdb_c.so`\n\nA tiny blank-line-only change in `cloud/test/stopwatch_test.cpp` is\nincluded to trigger cloud UT checks.\n\nRelated TeamCity failure: `SelectdbCore_Cloudut` build 49298 showed\n`AddressSanitizer:DEADLYSIGNAL` with `fdb_transaction_set_option\n(libfdb_c.so+0xd889c5)`.\n\n## Testing\n\n```bash\nbash -n cloud/script/run_all_tests.sh\nbash /data/data8/gavinchou/workspace/agent-workspace/tasks/fix-cloud-ut-asan/mock_run_all_tests_check.sh \\\n  /data/data8/gavinchou/workspace/agent-workspace/tasks/fix-cloud-ut-asan/apache-doris/cloud/script/run_all_tests.sh\nsh format_code.sh cloud/test/stopwatch_test.cpp\n```\n\nCo-authored-by: gavinchou \u003cgavinchou@apache.org\u003e"
    },
    {
      "commit": "6cb634e351cbb5b8251e55a8f18522673de7131b",
      "tree": "f1f9038b7200d5d98d7aad33c169eaea90e85132",
      "parents": [
        "d8e75cf63cd8372a1d7e4b68d629859d9dfa75dc"
      ],
      "author": {
        "name": "nsivarajan",
        "email": "117266407+nsivarajan@users.noreply.github.com",
        "time": "Fri Jul 17 14:39:34 2026 +0530"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jul 17 17:09:34 2026 +0800"
      },
      "message": "[Fix](Query Stats) Add QueryStatsRecorder for column-level query and filter - Part3 (#63974)\n\n### What problem does this PR solve?\n\nIssue Number: close #65423 \n\nRelated PR: #63067, #63768 \n\n**Problem Summary:**\n\nFollow-up of Part 2 (#63768). Extends column-level query/filter hit\nrecording to cover the constructs that were explicitly deferred from\nPart 2, plus additional cases discovered during full plan-node audit:\n\nDeferred from Part 2 (now resolved):\n- UNION / INTERSECT / EXCEPT —\nPhysicalSetOperation.getRegularChildrenOutputs() maps each child\nbranch\u0027s slots to queryHit; for EXCEPT and INTERSECT, the output-slot\nlink (used to resolve a filter kept above the\nset op) is further restricted to the first (build-side) branch only,\nsince both operators materialize their output block exclusively from\nbranch 0 at execution time — later branches are scanned only to\n  probe/exclude and never contribute values\n- CTE columns — PhysicalCTEProducer child is walked explicitly before\nsibling nodes so scan slots are registered; PhysicalCTEConsumer maps\nconsumer-side ExprIds to producer scan slots\n- LATERAL VIEW / EXPLODE — PhysicalGenerate generator input columns\nrecorded as queryHit; generator output slots are linked positionally to\ntheir own generator\u0027s inputs so a parent filter/GROUP BY on the exploded\n  column resolves back to the source\n- HAVING SUM(k2) \u003e 0 — single-input aggregate output ExprId is mapped to\nthe underlying scan so a parent PhysicalFilter records filterHit on k2\n\n  Additional gaps closed by full plan-node audit:\n- HAVING SUM(k2+k3) \u003e 0 — multi-input aggregate outputs populate\naggOutputToInputSlots; recordInputSlotsAsFilterHit expands to all\ncontributing columns\n- Mark join conjuncts — AbstractPhysicalJoin.getMarkJoinConjuncts() is a\nseparate field not included in hash or other conjuncts; adds filterHit\nfor IN/EXISTS subquery correlation columns\n- Other join conjuncts — getOtherJoinConjuncts() (non-equi join\npredicates) now records filterHit, previously untested and unhandled\n- Recursive CTE — PhysicalRecursiveUnion extends PhysicalBinary (not\nPhysicalSetOperation); base-case columns get queryHit via\ngetRegularChildrenOutputs(), and recursive-case\nPhysicalWorkTableReference slots are\nnow linked (by CTEId + position) to their base-case counterparts before\nthe recursive branch itself is walked, so a filter or join predicate\ninside the recursive branch that references a work-table column now\n  correctly resolves filterHit back to the real base-table column\n- Computed SELECT expressions — SELECT k1+k2 AS result now records\nqueryHit on both k1 and k2 via the multi-input branch of the\nPhysicalProject alias handler, consistent with how ORDER BY k1+k2 and\nGROUP BY k1+k2\n  already behave\n- HAVING GROUPING(a) / GROUPING_ID(...) — PhysicalRepeat (ROLLUP/CUBE)\noutput expressions are linked to their own inputs the same way aggregate\nHAVING outputs are, so a filter on a grouping-function output\n  resolves back to the real column\n- Positional window functions — ROW_NUMBER() / RANK() / DENSE_RANK()\nhave no function arguments, so a QUALIFY-style filter above them now\nresolves via the window expression\u0027s full input set (function args +\nPARTITION BY + ORDER BY keys) instead of just the function\u0027s own (empty)\ninputs\n\n  Accepted as-is (not a defect):\n- Set-op branch columns retained by ColumnPruning purely for\nrow-comparison (not otherwise selected) still get counted as a queryHit\non that branch. Narrowing this would flip currently-correct passing\ntests and\nrisks making an entire table disappear from SHOW QUERY STATS when its\nonly touched column is used solely for exclusion/comparison — a worse\noutcome than the current imprecision. Left unchanged; revisit only if it\n  proves to matter in practice.\n\n  Out of scope (intentional):\n- External tables (Hive / Iceberg / JDBC / ODBC) — OlapTable-only by\ndesign\n\nCo-authored-by: Sivarajan Narayanan \u003cnarayanan_sivarajan@apple.com\u003e"
    },
    {
      "commit": "d8e75cf63cd8372a1d7e4b68d629859d9dfa75dc",
      "tree": "13e6b6edc2c5ec502d49d0a483763a398ec8fdf4",
      "parents": [
        "662dc3599db790fa1b3d0e3bf455abace1561b22"
      ],
      "author": {
        "name": "Socrates",
        "email": "suyiteng@selectdb.com",
        "time": "Fri Jul 17 16:03:48 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jul 17 16:03:48 2026 +0800"
      },
      "message": "[fix](iceberg) Project Iceberg system table scans (#65262)\n\nIceberg system table scans serialize Iceberg SDK `FileScanTask` objects\nand the Java metadata scanner materializes rows from the task schema.\nDoris did not pass the actually required system table columns into the\nIceberg SDK scan, so metadata tables such as `$files` and `$data_files`\nwere planned with the full schema. For files metadata tables, the full\nschema can include virtual `readable_metrics` even when the SQL only\nrequests columns such as `file_size_in_bytes`.\n\nThis PR applies SDK top-level column projection only for Iceberg system\ntable scans. Normal Iceberg data table scans continue to rely on Doris\nBE scan range params for column pruning.\n\n### What changed?\n\n- Add system-table-only `TableScan.select(...)` projection in\n`IcebergScanNode` before planning the Iceberg SDK scan.\n- Skip synthesized/global row id and Iceberg row lineage columns when\nbuilding the SDK projection list.\n- Add regression coverage for ORC and Parquet Iceberg tables with\n`map\u003cboolean, boolean\u003e` columns, verifying projected `$data_files` and\n`$files` queries."
    },
    {
      "commit": "662dc3599db790fa1b3d0e3bf455abace1561b22",
      "tree": "0e76d96e99add72aeba514a7cad417d661fc9abd",
      "parents": [
        "039a1f0909f14a2c1942e9605fe904e2b22d34e9"
      ],
      "author": {
        "name": "Wen Zhenghu",
        "email": "wenzhenghu.zju@gmail.com",
        "time": "Fri Jul 17 14:58:17 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jul 17 14:58:17 2026 +0800"
      },
      "message": "[fix](fe) Preserve delete fallback failure cause (#65697)\n\n### What problem does this PR solve?\n\nProblem Summary: \n\nDeleteFromCommand currently falls back to DeleteFromUsingCommand when\npredicate validation fails. If the fallback execution also fails, the\noriginal validation exception is rethrown directly, which hides the\nactual fallback failure and makes troubleshooting difficult. This change\npreserves both failure causes by surfacing the fallback execution error,\nattaching it as the main cause, and keeping the original validation\nfailure as a suppressed exception. The PR also adds FE unit tests to\nverify the merged exception message and the null-message fallback path.\n\n### Release note\n\nImprove FE error reporting when DELETE fallback execution fails after\npredicate validation failure."
    },
    {
      "commit": "039a1f0909f14a2c1942e9605fe904e2b22d34e9",
      "tree": "ded60609afd6d2401c6939ca8ce2b8b844cd2afa",
      "parents": [
        "b357fc249ccd8849cc49ec581ac5f9cd523bbaf2"
      ],
      "author": {
        "name": "Mryange",
        "email": "yanxuecheng@selectdb.com",
        "time": "Fri Jul 17 14:41:11 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jul 17 14:41:11 2026 +0800"
      },
      "message": "[refine](exec) simplify DataQueue eos handling (#65324)\n\n### What problem does this PR solve?\n\nDataQueue exposed its internal child-finish and queue-selection protocol\nto source and sink operators, which made EOS handling duplicated and\ndifficult to follow.\n\nThis change carries EOS through `push_block()` and `DataQueueBlock`,\nmoves child selection into DataQueue, and updates Union and Cache\noperators to consume the unified result. Union sink also forwards each\ninput block immediately instead of retaining a cross-call output\n\n\n### Release note\n\nNone"
    },
    {
      "commit": "b357fc249ccd8849cc49ec581ac5f9cd523bbaf2",
      "tree": "2b9e7c2287201402640c1e5b44f8aa47bf2a09d2",
      "parents": [
        "b7931405b350957fb009090081914526fe7b9111"
      ],
      "author": {
        "name": "minghong",
        "email": "zhouminghong@selectdb.com",
        "time": "Fri Jul 17 14:22:49 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jul 17 14:22:49 2026 +0800"
      },
      "message": "[fix](eager-agg)Forbid pushdown when the child aggregate has no GROUP BY keys (#65703)\n\n### What problem does this PR solve?\n\nRelated PR: #63690\n\nProblem Summary:\ndo not push down a scalar agg to join\u0027s child\n```\nt1 (a, b)\n    (1,10)\n    (2,20);\n    \nSELECT t1.a, SUM(t1.b), t2.c\n    FROM t1, t2\n    WHERE t1.a \u003d 999\n    GROUP BY t1.a, t2.c\n```\nplan \n```\nagg(sum(t1.b) groupkey\u003d[t2.c])\n  -\u003enlj\n      -\u003efilter(a \u003d 999)\n           -\u003et1\n      -\u003e t2\n```\nwithout eager-agg, the output of nlj is empty, becase all rows of t1 are\nfiltered.\nif we push down agg uppon filter, the agg is scalar agg, since t1.a is\neliminated by constant propagation.\nthe plan becomes\n ```\nagg1(sum(x) groupkey\u003d[t2.c])\n  -\u003enlj\n     -\u003e agg2(sum(t1.b) as x, groupkey\u003d[])\n         -\u003efilter(a \u003d 999)\n             -\u003et1\n      -\u003e t2\n```\nagg2 output at least one row, and hence nlj output is not empty(t2 is not empty). and the final result is not empty now."
    },
    {
      "commit": "b7931405b350957fb009090081914526fe7b9111",
      "tree": "7bdfbbfe9c805f94eeb9350f865ffef4c848cb67",
      "parents": [
        "9a5afd4958f2794a2dd458b2e7463315e9746ce3"
      ],
      "author": {
        "name": "morrySnow",
        "email": "zhangwenxin@selectdb.com",
        "time": "Fri Jul 17 14:18:32 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jul 17 14:18:32 2026 +0800"
      },
      "message": "[fix](aggregate) Check all aggregate function arguments (#65537)\n\n### What problem does this PR solve?\n\nProblem Summary: NormalizeAggregate only checked the first child of each\naggregate function for nested aggregate functions. Aggregate functions\nin later arguments, such as SUM in the ORDER BY argument of\nGROUP_CONCAT, could bypass analysis validation. Check every aggregate\nchild and add FE unit and regression coverage for the invalid query.\n\n### Release note\n\nReject nested aggregate functions in every aggregate function argument."
    },
    {
      "commit": "9a5afd4958f2794a2dd458b2e7463315e9746ce3",
      "tree": "324c325f5e9ff58808f591ba15c37c0487378aee",
      "parents": [
        "3a75387e61388bd886eeef38b794d5ed4eb298bf"
      ],
      "author": {
        "name": "OneG",
        "email": "45254543+GJ100@users.noreply.github.com",
        "time": "Fri Jul 17 14:08:40 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jul 17 14:08:40 2026 +0800"
      },
      "message": "[fix](arrow-flight-sql) Close temporary VectorSchemaRoot in createPreparedStatement to fix FE direct memory leak (#65311)\n\n```\n### What problem does this PR solve?\n\nIssue Number: close #65305\n\nProblem Summary:\n\n`DorisFlightSqlProducer#createPreparedStatement` creates two `VectorSchemaRoot`\ninstances (an empty one for the parameter schema, and one from\n`FlightSqlChannel#createOneOneSchemaRoot(...)` for the result metadata),\nextracts only their `Schema`, and never closes them.\n\n`createOneOneSchemaRoot` allocates a `VarCharVector` from the channel\n`RootAllocator` (off-heap, Netty pooled direct buffer). Since the returned\n`VectorSchemaRoot` is never closed, its off-heap buffer is leaked on every\nprepare call (effectively every Arrow Flight query, because ADBC prepares each\nstatement). The buffer cannot be reclaimed by GC (strongly referenced by the\nallocator) nor by closing the client session (`FlightSqlChannel#close()` only\ninvalidates the result cache and does not close the allocator; the leaked root\nis never stored in any session map).\n\nUnder continuous Arrow Flight query load this makes FE direct memory grow\nmonotonically until:\n`java.lang.OutOfMemoryError: Cannot reserve ... bytes of direct buffer memory (Internal; Prepare)`\n\nThis PR wraps both temporary roots in try-with-resources so their off-heap\nbuffers are released after the `Schema` is extracted. `Schema` is an immutable\nPOJO and remains valid after the root is closed. Compare with\n`getCatalogs`/`getSchemas`/`getTables` in the same producer, which already use\ntry-with-resources.\n\n---------\n\nSigned-off-by: 柳吟风 \u003c523684989@qq.com\u003e\nSigned-off-by: morningman \u003cyunyou@selectdb.com\u003e\nCo-authored-by: 柳吟风 \u003c523684989@qq.com\u003e\nCo-authored-by: morningman \u003cyunyou@selectdb.com\u003e"
    },
    {
      "commit": "3a75387e61388bd886eeef38b794d5ed4eb298bf",
      "tree": "d9271107709531a84941f44b89776078c553c38b",
      "parents": [
        "14331e4974d33085d070ca23b991f1ffcb4e39fb"
      ],
      "author": {
        "name": "Raghvendra Singh",
        "email": "raghav@cashify.in",
        "time": "Fri Jul 17 11:23:52 2026 +0530"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jul 17 13:53:52 2026 +0800"
      },
      "message": "[fix](catalog) re-auth Iceberg REST catalog on 401 instead of wedging until FE restart (#64966)\n\n### What problem does this PR solve?\n\nProblem Summary:\n\nAn Iceberg REST catalog using oauth2 client credentials obtains its\ntoken when the catalog client is built and keeps it refreshed in the\nbackground. If a refresh permanently fails (for example the auth server\nis briefly unreachable or overloaded right when the refresh fires), the\nclient is left holding a stale token, and every subsequent request fails\nwith `NotAuthorizedException: Not authorized` (HTTP 401) — forever. The\nclient has no re-authentication path of its own, and neither `REFRESH\nCATALOG` nor metadata-cache invalidation rebuilds it, so the catalog\nstays unusable until the FE process is restarted.\n\nWe hit this in production against an Apache Polaris REST catalog. The\nfailure mode is nasty to diagnose because the 401 is swallowed:\n`getDbNullable`/`buildDbForInit` report `Database [\u003cdb\u003e] does not\nexist`, `SHOW DATABASES` returns a partial list, and since DDL\nstatements (`CREATE VIEW`/`CREATE MTMV`/CTAS/`INSERT`) are\n`ForwardToMaster`, one wedged master FE breaks all DDL cluster-wide\nwhile plain SELECTs on follower FEs keep working. Only an FE restart\nrecovered.\n\n### What is changed?\n\nRecovery now lives entirely at the REST layer (per review feedback —\nthanks @CalvinKirs): a new `ReauthenticatingRestSessionCatalog` (a\n`BaseViewSessionCatalog`) that `IcebergRestProperties` wraps around the\n`RESTSessionCatalog` it already builds and owns. On a 401 raised by a\nrequest made under the catalog\u0027s own identity, the wrapper rebuilds the\ndelegate through the same build seam (fresh HTTP client, fresh OAuth2\ntoken fetch), closes the wedged client, and retries the operation once.\nIf the retry also fails, the error propagates unchanged.\n\nDesign notes:\n\n- **REST-only.** Only `IcebergRestProperties` constructs the wrapper; no\nother catalog type sees this logic. `IcebergMetadataOps` is untouched\napart from widening two declared types from `RESTSessionCatalog` to\n`BaseViewSessionCatalog`.\n- **All paths covered, no call-site changes, no stale references.** The\ndefault catalog (`asCatalog(empty)`), the view catalog\n(`asViewCatalog(empty)`), and per-user delegated sessions are all thin\nviews that call back into the session catalog, so they inherit the\nrecovery automatically — and because the swap happens below them, no\nholder of any of those references ever points at a dead client after\nrecovery.\n- **Reads and mutations both retried.** A 401 is rejected by the server\nbefore the request is processed, so retrying after re-authentication\ncannot double-apply an operation.\n- **Per-user delegated sessions are excluded.** A 401 on a request\ncarrying a delegated (per-user) credential means that user\u0027s token is\ninvalid; rebuilding the shared client cannot fix it and must not be\ntriggered by it. Likewise, when catalog initialization itself runs under\na delegated credential (`iceberg.rest.session\u003duser`), no wrapper is\ninstalled, so the rebuild path can never capture a user token into the\nshared client.\n- **Typed detection only.** `NotAuthorizedException` anywhere in the\ncause chain; no message-string matching.\n- **Bounded.** One rebuild + one retry per failed request; concurrent\n401s coalesce into a single rebuild (a request that raced with a\ncompleted rebuild just retries on the fresh client).\n\nBoundary: recovery triggers on catalog-level operations. Objects already\nhanded out (a loaded `Table`/`View`, a `buildTable`/`buildView` builder)\nkeep the client they were created with; if the token expires between\nobtaining and using one, that single use can still 401, and the next\ncatalog-level operation heals the client.\n\n### Behavior changed:\n\n- Iceberg REST catalogs recover automatically from an expired/rejected\nOAuth2 token (401) by re-authenticating and retrying once, instead of\nfailing every request until the FE restarts. Other catalog types and\nnon-401 failures are unaffected.\n\nSigned-off-by: Raghvendra Singh \u003craghav@cashify.in\u003e\nCo-authored-by: Claude \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "14331e4974d33085d070ca23b991f1ffcb4e39fb",
      "tree": "a9aaa4f49a95c35219e03ed5628438b54f571a45",
      "parents": [
        "9dd68334c5d83ff1dd5add93346763133e37f416"
      ],
      "author": {
        "name": "hui lai",
        "email": "laihui@selectdb.com",
        "time": "Fri Jul 17 12:57:06 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jul 17 12:57:06 2026 +0800"
      },
      "message": "[enhance](cloud) Increase get version timeout to 30 seconds (#65625)\n\n### What problem does this PR solve?\n\nThe default timeout for FE get_version requests to MetaService is 3\nseconds, which is too short when MetaService responds slowly. Increase\nthe default timeout to 30 seconds to reduce premature request failures."
    },
    {
      "commit": "9dd68334c5d83ff1dd5add93346763133e37f416",
      "tree": "3df70dbd8a1b61e60def5e7b3118573a00ba29b1",
      "parents": [
        "007e75666b5e20cb0a70c3f63e39cd8756345aa4"
      ],
      "author": {
        "name": "yujun",
        "email": "yujun@selectdb.com",
        "time": "Fri Jul 17 12:15:09 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jul 17 12:15:09 2026 +0800"
      },
      "message": "[test](fe) Cover internal query audit failure in FE unit test (#65696)\n\n### What problem does this PR solve?\n\nThe previous regression coverage for internal query audit failures\ndepended on querying `__internal_schema.audit_log` after a\nfault-injection failure. That path is timing-sensitive because audit\npersistence is asynchronous, so the regression can fail even when\n`StmtExecutor.executeInternalQuery()` already reports the error audit\nevent correctly.\n\nThis PR replaces that flaky coverage with a deterministic FE unit test.\nThe new test forces a planning failure in `executeInternalQuery()` and\nverifies that the submitted `AuditEvent` is marked as `ERR` and carries\nthe failure details. The unstable fault-injection regression case is\nremoved."
    },
    {
      "commit": "007e75666b5e20cb0a70c3f63e39cd8756345aa4",
      "tree": "4666d4fd4478f68b0edd1726c4b428719cf54c06",
      "parents": [
        "a99acfc7a406a29a08213f2cfa5bed80055181be"
      ],
      "author": {
        "name": "Chenyang Sun",
        "email": "sunchenyang@selectdb.com",
        "time": "Fri Jul 17 12:01:45 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jul 17 12:01:45 2026 +0800"
      },
      "message": "[refactor](be) move storage field type conversion to core (#65717)\n\n### What problem does this PR solve?\n\nIssue Number: close #xxx\n\nRelated PR: #xxx\n\nProblem Summary:\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test \u003c!-- At least one of them must be included. --\u003e\n    - [ ] Regression test\n    - [x] Unit Test\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [ ] No need to test or manual test. Explain why:\n- [ ] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [ ] Other reason \u003c!-- Add your reason?  --\u003e\n\n- Behavior changed:\n    - [ ] No.\n    - [ ] Yes. \u003c!-- Explain the behavior change --\u003e\n\n- Does this need documentation?\n    - [ ] No.\n- [ ] Yes. \u003c!-- Add document PR link here. eg:\nhttps://github.com/apache/doris-website/pull/1214 --\u003e\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label \u003c!-- Add branch pick label that this PR\nshould merge into --\u003e"
    },
    {
      "commit": "a99acfc7a406a29a08213f2cfa5bed80055181be",
      "tree": "89c07ca29773cc6bcc4923208dfdd31d8a31159a",
      "parents": [
        "f697dfa973c32f5bf59639bc1b31602a8caf6da6"
      ],
      "author": {
        "name": "Mingyu Chen (Rayner)",
        "email": "yunyou@selectdb.com",
        "time": "Fri Jul 17 11:47:29 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jul 17 11:47:29 2026 +0800"
      },
      "message": "[improvement](be-java-extensions) Drop hive-catalog-shade from be-java-extensions (#65733)\n\nRemove the two consumers of the ~122MB\norg.apache.doris:hive-catalog-shade fat jar under fe/be-java-extensions.\n\n1. Delete the dead avro-scanner module. Its AvroJNIScanner is never\nrouted to (no FE code references the class name it would be loaded by),\nit is absent from the shared preload-extensions jar, and its only\nregression test (test_tvf_avro.groovy) is entirely commented out.\nRemoves the module, its be-java-extensions/pom.xml and build.sh\nregistrations, and the dead test plus its .out data.\n\n2. Slim java-udf: replace the hive-catalog-shade dependency with a new\nminimal hive-udf-shade module that shades hive-exec:3.1.3:core down to\nonly the Hive UDF contract classes a user UDF extends -- ql.exec.UDF and\nits load-time closure (DefaultUDFMethodResolver/UDFMethodResolver),\nql.exec.Description, and ql.metadata.HiveException. The shaded jar is 20\nKB / 10 classes; java-udf\u0027s fat jar goes from 12807 hive classes to 10\n(all metastore/serde2/objectinspector/relocated-thrift dropped).\n\nThe fe/pom.xml version pin and the fe-core / fe-connector-hms /\nfe-common consumers still depend on hive-catalog-shade and are\nintentionally left untouched (out of scope)."
    },
    {
      "commit": "f697dfa973c32f5bf59639bc1b31602a8caf6da6",
      "tree": "f1491201f1d426c08363603ac3d203b7289efb0f",
      "parents": [
        "589fd3e9a1e8bd05d2baba5533e20f164db9854c"
      ],
      "author": {
        "name": "shuke",
        "email": "shuke@selectdb.com",
        "time": "Fri Jul 17 11:46:08 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jul 17 11:46:08 2026 +0800"
      },
      "message": "[fix](fe) Revert semi join constraint matching fix (#65546)\n\nRelated PR: #65205\n\nProblem Summary:\n\nPR #65205 introduced a correctness issue in leading hint planning by\nchanging how semi/anti join constraints are matched. This PR reverts\nmerge commit `cbccdafa41adfff73e683ac1dfcae46638c16d48` in full,\nrestoring the previous planner behavior while a correct fix is prepared."
    },
    {
      "commit": "589fd3e9a1e8bd05d2baba5533e20f164db9854c",
      "tree": "75246532d2d55d8807acbc1af6807b28c1864472",
      "parents": [
        "2ec58216929c12ff9aff9f043627ab8984628dad"
      ],
      "author": {
        "name": "924060929",
        "email": "lanhuajian@selectdb.com",
        "time": "Fri Jul 17 11:19:10 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jul 17 11:19:10 2026 +0800"
      },
      "message": "[opt](local shuffle) support bucket shuffle for set operation (#65129)\n\nRelated PR: #59006, #60823\n\nProblem Summary:\n\nRe-enable bucket shuffle for set operation (union / intersect / except).\n\nThis optimization was introduced by #59006 and later disabled by #60823,\nbecause at that time the backend could not plan the correct local\nshuffle type for set operation, which produced wrong results. Now that\nplanning local shuffle in the frontend is supported, the frontend plans\nthe correct local shuffle type, so it is safe to re-enable this feature.\n\nWhen `enable_local_shuffle_planner` is enabled (the default), the\noptimizer keeps the distribution of the largest natural /\nstorage-bucketed child of a set operation and bucket-shuffles the other\nchildren onto it (the same idea as bucket shuffle join), which avoids a\nfull reshuffle of the largest side. When it is disabled, the previous\nfull-shuffle behavior is kept, so this change is gated and only takes\neffect with the frontend local shuffle planner.\n\nBesides re-enabling the planning, two local-exchange alignment problems\nthat the re-enabled shapes expose are fixed (both verified on a 4-BE\ncluster with a stable wrong-result reproduction before the fix and\ncorrect results after):\n\n1. `SetOperationNode.enforceAndDeriveLocalExchange` only handled the\ncolocate mode. A bucket-shuffle intersect / except whose basic child is\na join output (instead of a direct scan) fell into the partitioned\nbranch, so the basic side was locally re-partitioned by execution hash\nwhile the other side stayed bucket-distributed, and the set operation\nlost rows. Now the bucket-shuffle mode takes the same\n`requireBucketHash` path as colocate, mirroring `HashJoinNode`.\n\n2. `RequireSpecific.autoRequireHash()` degraded\n`LOCAL_EXECUTION_HASH_SHUFFLE` to the generic hash requirement.\nPass-through operators (union / streaming agg / sort) forward this\nrequirement to children while claiming the specific type to their\nparent, so under a bucket join upgraded to local hash, a\nbucket-distributed child could satisfy the generic requirement and keep\nits bucket placement while the parent skipped its re-align local\nexchange, and the mixed placements computed wrong results. A specific\nhash requirement is now forwarded as-is.\n\nThe regression suite disables the bucket shuffle downgrade so the chosen\nshapes do not depend on the backend count / parallelism of the\nenvironment, and adds a case where the basic child of a bucket-shuffle\nintersect is a join output."
    },
    {
      "commit": "2ec58216929c12ff9aff9f043627ab8984628dad",
      "tree": "5f1465264dee454cd24bedae46f9b6956dc32898",
      "parents": [
        "4df43e95c39c5d844c17723b624f06e1e646f48d"
      ],
      "author": {
        "name": "zclllyybb",
        "email": "zhaochangle@selectdb.com",
        "time": "Fri Jul 17 10:31:20 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jul 17 10:31:20 2026 +0800"
      },
      "message": "[fix](be) Guard empty stack trace formatting (#65731)\n\nAn empty or too-short libunwind stack trace underflows the\nremaining-frame count in `StackTrace::toString()` after it skips its\ninternal frames. Formatting an error `Status` then reads beyond the\nfixed frame array and can crash the BE.\n\nTreat non-positive stack-capture results as empty, render a trace with\nfewer frames than the requested skip as the existing `\u003cEmpty trace\u003e`,\nand preserve signal-context offsets after slicing. A debug-point BEUT\ninjects an empty trace into the real `Status::NotFound` capture/logging\npath."
    },
    {
      "commit": "4df43e95c39c5d844c17723b624f06e1e646f48d",
      "tree": "bbdb84076dd0b0f892dd89146c0a8a9c302692be",
      "parents": [
        "ecf7732adbdb2cdbfb926ab2ba6bc83e86c9f80e"
      ],
      "author": {
        "name": "zhangstar333",
        "email": "zhangsida@selectdb.com",
        "time": "Fri Jul 17 10:16:40 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jul 17 10:16:40 2026 +0800"
      },
      "message": "[chore](thirdparty) introduce thridparty about lance-c (#65304)\n\n### What problem does this PR solve?\nintroduce thridparty about lance-c\nhttps://github.com/lance-format/lance-c"
    },
    {
      "commit": "ecf7732adbdb2cdbfb926ab2ba6bc83e86c9f80e",
      "tree": "1910fec886b2d15d268b8e463043c888703868f5",
      "parents": [
        "728d362448a25051293291b42985a79f91c0122c"
      ],
      "author": {
        "name": "Mryange",
        "email": "yanxuecheng@selectdb.com",
        "time": "Fri Jul 17 08:49:26 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jul 17 08:49:26 2026 +0800"
      },
      "message": "[fix](function) Support cross-dimensional casts for null arrays (#65456)\n\n### What problem does this PR solve?\n\n\nWhen constant folding was disabled, BE rejected casts from null-only\narrays to arrays with a different number of dimensions, although FE\nconstant folding already supported them. Root cause: the BE array cast\nchecked dimensionality before recognizing that the source element type\nrepresented a NULL literal.\n\nThis change allows cross-dimensional array casts when the source nested\ntype is a NULL literal, while ordinary arrays still require matching\ndimensions. For example, `CAST([NULL, NULL] AS ARRAY\u003cARRAY\u003cINT\u003e\u003e)` now\nreturns `[NULL, NULL]`: BE casts the two nested NULL values to\n`Nullable\u003cArray\u003cInt\u003e\u003e` and rebuilds the outer array with its original\noffsets.\n\n### Release note\n\nSupport casting null-only arrays to array types with different\ndimensions in BE.\n\n### Check List (For Author)\n\n- Test \u003c!-- At least one of them must be included. --\u003e\n    - [ ] Regression test\n    - [ ] Unit Test\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [ ] No need to test or manual test. Explain why:\n- [ ] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [ ] Other reason \u003c!-- Add your reason?  --\u003e\n\n- Behavior changed:\n    - [ ] No.\n    - [ ] Yes. \u003c!-- Explain the behavior change --\u003e\n\n- Does this need documentation?\n    - [ ] No.\n- [ ] Yes. \u003c!-- Add document PR link here. eg:\nhttps://github.com/apache/doris-website/pull/1214 --\u003e\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label \u003c!-- Add branch pick label that this PR\nshould merge into --\u003e"
    },
    {
      "commit": "728d362448a25051293291b42985a79f91c0122c",
      "tree": "3d0cf64858b26021d9c1f2e2ed1f26fd32859bfd",
      "parents": [
        "db0b39a4f1ab56811b9be9662d80d7c65435ab45"
      ],
      "author": {
        "name": "shuke",
        "email": "shuke@selectdb.com",
        "time": "Thu Jul 16 21:06:34 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 16 21:06:34 2026 +0800"
      },
      "message": "[fix](test) Stabilize BDB unmatched transaction test (#65587)\n\n\nRelated PR: #62221\n\nProblem Summary:\n\n`BDBEnvironmentTest.testReadTxnIsNotMatched` exercises the case where a\ntransaction is committed locally but majority acknowledgement fails,\nthen verifies that the old master can still read the local transaction\nafter restart.\n\nThe test had two timing/lifecycle gaps:\n\n1. It read the initial record from followers immediately after the\nmaster write, so a follower could legitimately return `NOTFOUND` before\nreplication became visible.\n2. The Mockito `RepImpl` spy used to inject `InsufficientAcksException`\nremained installed during environment shutdown and restart, allowing the\nmock to affect unrelated BDB lifecycle work.\n\nThis change waits for the baseline value on each follower, limits the\nspy to the injected write and restores the original implementation in\n`finally`, and verifies the exact locally committed value after restart.\n\nThe assertion is intentionally limited to local durability.\n`InsufficientAcksException` does not guarantee that the record was never\ndelivered to a follower before acknowledgement failed."
    },
    {
      "commit": "db0b39a4f1ab56811b9be9662d80d7c65435ab45",
      "tree": "7cecd43611f4dbd9d203243a06bdb8b768c055f9",
      "parents": [
        "5abd40cc536f97030ddc0334f3e2220420090abf"
      ],
      "author": {
        "name": "shuke",
        "email": "shuke@selectdb.com",
        "time": "Thu Jul 16 21:04:07 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 16 21:04:07 2026 +0800"
      },
      "message": "[fix](be) Make replica fault injection deterministic (#65627)\n\nRelated PR: #47082\n\nProblem Summary:\n\n`StreamSinkFileWriter` previously implemented the one/two-replica\nfault-injection debug points by skipping the first entries in each\nwriter\u0027s local stream order. Writers for the same tablet can have\ndifferent stream orders, so they could skip different destination\nbackends. A nominal one-replica failure could therefore affect two\nreplicas across writers and make the load lose quorum.\n\nThis PR derives the failed replica set from sorted destination backend\nIDs. Every writer with the same replica set now skips the same\nbackend(s), independent of local stream order. It also adds unit\ncoverage for one- and two-replica injection with reordered stream lists."
    },
    {
      "commit": "5abd40cc536f97030ddc0334f3e2220420090abf",
      "tree": "9b8dff4a0c40e6b658b63decf14abd31353e88ae",
      "parents": [
        "87d57a84729b61fcdd2e00e4e10697b46b2a82e9"
      ],
      "author": {
        "name": "shuke",
        "email": "shuke@selectdb.com",
        "time": "Thu Jul 16 21:03:40 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 16 21:03:40 2026 +0800"
      },
      "message": "[fix](regression) Disable auto compaction for physical row inspections (#65536)\n\nRelated PR: #65211, #64508\n\nProblem Summary:\n\nSome unique-key MOW regression cases deliberately bypass the delete\nbitmap and assert historical physical rows, sequence values, skip\nbitmaps, and rowset versions. These assertions require the\npre-compaction rowsets to remain available, but the affected tables\nleave automatic compaction enabled. Background compaction can therefore\nmerge the expected old rowsets before `qt_inspect` or an equivalent\nraw-row query runs, while all user-visible SQL results remain correct.\n\nThis PR disables table-level automatic compaction only for the tables\nwhose assertions intentionally depend on historical physical layout. It\ncovers nine case files, including publish-conflict, read-from-old,\ndelete-sign, sequence-column, and auto-increment physical-row checks.\nGolden results and user-visible correctness assertions are unchanged.\n\nThe source scan reviewed all master cases that set\n`skip_delete_bitmap\u003dtrue` without already disabling automatic\ncompaction. Four remaining files were intentionally excluded because the\nvariable is used only for debug-mode validation, routine-load waiting,\ninsert execution setup, or binlog TVF behavior; they do not assert\nhistorical table rowsets."
    },
    {
      "commit": "87d57a84729b61fcdd2e00e4e10697b46b2a82e9",
      "tree": "4f22b532c4445353c489880f67bfb4ffd3dc8884",
      "parents": [
        "fd16ebdc331fec44ad618e7453a4cdd01bba48e7"
      ],
      "author": {
        "name": "Calvin Kirs",
        "email": "guoqiang@selectdb.com",
        "time": "Thu Jul 16 20:35:41 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 16 20:35:41 2026 +0800"
      },
      "message": "[feat](fs) Migrate HDFS properties into fe-filesystem; fix OSS listing/URI issues (#64695)\n\n## What\n\nThis PR contains two related fe-filesystem changes:\n\n### 1. Migrate kernel HDFS properties into fe-filesystem\n\nThe HDFS SPI module previously passed raw user properties straight into\na Hadoop `Configuration` without any translation. When fed raw\nproperties it silently dropped the typed authentication params — most\nimportantly `hdfs.authentication.kerberos.principal/keytab` were never\nmapped to `hadoop.kerberos.principal/keytab`, so Kerberos was not\nrecognized and the client fell back to simple auth without any error.\n\nThis change ports the kernel `HdfsProperties` inheritance chain into a\nnew package `org.apache.doris.filesystem.hdfs.properties`, preserving\nthe parameter translation, xml-resource loading, and default injection,\nwhile keeping the module free of fe-core / fe-common (only the\nalready-allowed `fe-foundation` dependency is added):\n\n- Copies `ConnectionProperties`, `StorageProperties` (minus the\nprovider-registry static factory, which referenced every storage\nprovider), `HdfsCompatibleProperties` (minus the `hadoopAuthenticator`\nfield that depended on fe-common security), `HdfsProperties`, and\n`HdfsPropertiesUtils`.\n- Adds a minimal local `UserException` stub (the original drags in\n`ErrorCode`/`InternalErrorCode`) and a local `HdfsConfigFileLoader`\nreplacing `CatalogConfigFileUtils`.\n- `HdfsFileSystemProvider.create()` now resolves raw properties through\n`HdfsProperties` before constructing `DFSFileSystem`.\n- The FE-side hadoop config directory is passed down as a\nsystem-injected context key `_HADOOP_CONFIG_DIR_` (set by\n`StoragePropertiesConverter`), so the filesystem module can resolve\n`hadoop.config.resources` without depending on fe-core `Config`.\n\n### 2. OSS listing / exception / URI fixes\n\n- Fix OSS recursive list truncation by falling back to the last returned\nkey when `NextMarker` is absent.\n- Catch `OSSException` (a sibling of `ClientException`) so server-side\nOSS errors are translated to `IOException` instead of leaking.\n- Consolidate object-storage URI parsing into a single shared,\nscheme-agnostic parser and align path-style handling across\nS3/OSS/COS/OBS.\n\n## Tests\n\n- `fe-filesystem-hdfs`: full module test suite passes, including the new\n`HdfsPropertiesTest` (Kerberos translation, simple-auth fallback\ndefault, user-overridden keys preserved, xml load from injected/absolute\npaths, idempotency). 0 checkstyle violations.\n- A focused `StoragePropertiesConverter` test asserting\n`_HADOOP_CONFIG_DIR_` injection is added on the fe-core side.\n- The fe-core module could not be fully compiled in the local\nenvironment; CI validates that side."
    },
    {
      "commit": "fd16ebdc331fec44ad618e7453a4cdd01bba48e7",
      "tree": "3b57d7c36c1cf29ba44fbc5410efaa3be6a25b74",
      "parents": [
        "10de7ac06145a3f1f0cca0e70f07c4f51dee0365"
      ],
      "author": {
        "name": "shuke",
        "email": "shuke@selectdb.com",
        "time": "Thu Jul 16 19:40:02 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 16 19:40:02 2026 +0800"
      },
      "message": "[fix](regression) Select compaction profile BE by tablet replica (#65552)\n\nRelated PR: #63886\n\nProblem Summary:\n\n`test_compaction_profile_action` queried `/api/compaction/profile` on\nthe first backend returned by `SHOW BACKENDS`, independently of the\ntarget tablet placement. The profile API reads the selected BE\u0027s\nprocess-local compaction tracker, so the tablet filter returned an empty\nlist whenever that backend was the one BE outside the target tablet\u0027s\nthree-replica set.\n\nThe branch-4.1 P0 history shows 13 failures in 32 runs. In every\nfailure, the queried first backend was outside the target replica set;\nin every passing run, it hosted a target replica.\n\nThis change derives the profile endpoint from the `BackendId` in the\nsame `SHOW TABLETS` row as the selected `TabletId`. The case therefore\nqueries a replica that participates in `trigger_and_wait_compaction`\nwhile preserving all existing API assertions."
    },
    {
      "commit": "10de7ac06145a3f1f0cca0e70f07c4f51dee0365",
      "tree": "1c244bcd2e27861bfb3ee4befcf1df3a4003300a",
      "parents": [
        "d983219f5bbcf49bf46f214ae71ef42af57bbb42"
      ],
      "author": {
        "name": "shuke",
        "email": "shuke@selectdb.com",
        "time": "Thu Jul 16 18:15:08 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 16 18:15:08 2026 +0800"
      },
      "message": "[fix](test) stabilize file cache statistics case (#65666)\n\nRelated PR: #65261\n\nProblem Summary:\n\n`test_file_cache_statistics` verifies that a repeated Hive Parquet read\nreaches\nBlockFileCache and increments both `total_hit_counts` and\n`total_read_counts`.\n\nPR #65261 made the suite `nonConcurrent`, aggregated counters across\ncache\npaths, and polled asynchronous metrics. However, the final counter\nassertion can\nstill pass or fail with identical relevant code:\n\n- External Regression 995803: PASS, counters `7503/8945 -\u003e 7504/8946`\n- External Regression 996006: muted FAIL, the two-counter condition\ntimed out\n- External Regression 996177: PASS, counters `7506/8947 -\u003e 7507/8948`\n\n`nonConcurrent` only prevents other suites from running concurrently. It\ndoes\nnot serialize file scan ranges inside one SQL statement. The original\nquery\nscans a six-file partitioned table without partition predicates, uses\nautomatic\npipeline parallelism, and stops at `LIMIT 1`. Which scan ranges perform\nreads\nbefore cancellation therefore depends on scheduling.\n\nParquet file page cache is also enabled by default. It can serve cached\npage\nheaders and payloads before the underlying file reader reaches\nBlockFileCache.\nIts admission is selective and uses shared BE cache state. As a result,\nthe\nthird query may either touch one BlockFileCache block or be completed\nfrom the\nupper page cache, while both behaviors return the correct query result.\n\nThis is a test isolation issue: a case that asserts BlockFileCache\ncounter\ndeltas must make the read path deterministic and isolate the cache layer\nunder\ntest.\n\nThis PR:\n\n- explicitly disables SQL cache, Hive SQL cache, and Parquet file page\ncache for\n  this BlockFileCache statistics case;\n- sets `parallel_pipeline_task_num\u003d1` to remove internal scan-instance\nraces;\n- adds `nation\u003d\u0027cn\u0027 and city\u003d\u0027beijing\u0027` predicates so the query scans\nthe\n  single partition file containing the existing expected row.\n\nThe output oracle and all file-cache metric assertions are unchanged."
    },
    {
      "commit": "d983219f5bbcf49bf46f214ae71ef42af57bbb42",
      "tree": "fc9669cc5e7742d69b3ceb16f396211b139170b8",
      "parents": [
        "532042c4dada051fa8ea3692eaf2f4c2e14a7db3"
      ],
      "author": {
        "name": "shuke",
        "email": "shuke@selectdb.com",
        "time": "Thu Jul 16 18:12:17 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 16 18:12:17 2026 +0800"
      },
      "message": "[fix](regression) Restore time zone after test_tz_load (#65412)\n\n## Summary\n- restore the original global time_zone after test_tz_load changes it\nduring the suite\n- clean up the temporary global_timezone_test table in the finally block\n- prevent later nonConcurrent suites from inheriting UTC global time\nzone\n\n## Evidence\n- master has the same test_tz_load pattern: it sets GLOBAL time_zone to\nUTC and previously did not restore it before suite exit\n- test_json_load contains a timezone-sensitive from_unixtime assertion\nthat can be affected by the leaked global time zone\n- same branch-4.1 fix was submitted in #65385"
    },
    {
      "commit": "532042c4dada051fa8ea3692eaf2f4c2e14a7db3",
      "tree": "236d9284ccfaaf6b2893a85525ebf50ace13aff4",
      "parents": [
        "52cdeb6c4cf88a629ec38258e22fe7333f4c57a6"
      ],
      "author": {
        "name": "shuke",
        "email": "shuke@selectdb.com",
        "time": "Thu Jul 16 18:08:08 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 16 18:08:08 2026 +0800"
      },
      "message": "[fix](regression) Drop view before its base tables in test_join5 (#65455)\n\n### What problem does this PR solve?\n\nIssue Number: DORIS-23587\n\nRelated PR: None\n\nProblem Summary:\n\n`test_join5` creates the `uqv1` view on top of `uq1`, but drops `uq1`\nwhile the view still exists. A concurrent\n`information_schema.view_dependency` scan can observe this dangling view\nand fail while resolving its missing base table.\n\nThis change drops `uqv1` after its last assertion and before dropping\nits base tables, so the case does not expose an invalid intermediate\nstate to concurrently running regression suites."
    },
    {
      "commit": "52cdeb6c4cf88a629ec38258e22fe7333f4c57a6",
      "tree": "c80894c78b4b550573c109cd3e0dd0964119d2d9",
      "parents": [
        "e40d599a2ac22c9deaff7ad3badf6ff3ff3acbb7"
      ],
      "author": {
        "name": "shuke",
        "email": "shuke@selectdb.com",
        "time": "Thu Jul 16 17:58:28 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 16 17:58:28 2026 +0800"
      },
      "message": "[fix](regression) Wait for show config output before parsing (#65486)\n\nProblem Summary:\n\n`test_mow_table_with_format_v2` used `consumeProcessOutput` to start\nasynchronous stdout and stderr reader threads, then called\n`Process.waitFor()` and immediately parsed the stdout buffer.\n`waitFor()` only waits for the curl process and does not join the reader\nthreads, so the case could intermittently parse an empty buffer even\nthough `/api/show_config` returned a complete response.\n\nThe same failure signature appeared in P0 build 992109 and Cloud P0\nbuild 983996: curl exited with code 0 and reported receiving the\nresponse, while the captured stdout was still empty at parse time.\n\nThis change uses `waitForProcessOutput` to wait for both reader threads\nbefore reading the exit value and parsing the JSON response. The product\nassertions and test intent are unchanged."
    },
    {
      "commit": "e40d599a2ac22c9deaff7ad3badf6ff3ff3acbb7",
      "tree": "43f521e52f2cb640332e7ec0caf1d267ad4d9140",
      "parents": [
        "b4aa99ecbd405c938779ebac856938a3792babe3"
      ],
      "author": {
        "name": "seawinde",
        "email": "wusi@selectdb.com",
        "time": "Thu Jul 16 17:33:04 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 16 17:33:04 2026 +0800"
      },
      "message": "[fix](be) Fix row binlog writer test metadata (#65686)\n\nRelated PR: #65076, #65663\n\nProblem Summary:\nThe master BE unit test target fails to compile because\n`GroupRowsetWriterTest` still passes a pointer to\n`allocate_binlog_lsn()`, which now accepts a vector reference. After\nfixing compilation, the suite aborts while flushing row-binlog data.\n\nRoot cause: In `GroupRowsetWriterTest.SetUp()`, two columns are removed\nfrom the generated row-binlog schema without updating `binlog_tso_idx`,\n`binlog_lsn_idx`, and `binlog_op_idx`. `RowBinlogSegmentWriter`\nconsequently accesses column ID 7 in a seven-column schema. The test\noutput assertions also use the previous LSN, operation, and timestamp\nlayout.\n\nThis change initializes the shared LSN vector before passing it by\nreference, resets the metadata indexes after filtering columns, and\nverifies TSO, LSN, and operation values using the current schema order\nand types."
    },
    {
      "commit": "b4aa99ecbd405c938779ebac856938a3792babe3",
      "tree": "0dbac1c3e4c99942fcc69a93d04e39a806b1eb23",
      "parents": [
        "e7b0e4e22d538447f3bd31ecf89c5d2cf5b16665"
      ],
      "author": {
        "name": "shuke",
        "email": "shuke@selectdb.com",
        "time": "Thu Jul 16 17:00:31 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 16 17:00:31 2026 +0800"
      },
      "message": "[fix](regression) Make incomplete commit info injection deterministic (#65633)\n\nRelated PR: #53979\n\nProblem Summary:\n\n`test_incomplete_commit_info` expects stream load to fail after the\ndebug point removes tablet commit information. The debug point\npreviously skipped the first tablet independently in every\n`VNodeChannel`. With multiple buckets, those skipped entries could\nbelong to different tablets, so every tablet could still retain the\nrequired replica quorum and the load would legally succeed.\n\nThis change makes the injection deterministic:\n\n- the debug point skips commit info only for an explicitly supplied\n`tablet_id`;\n- the regression case keeps its five-bucket layout, performs a\nsuccessful seed load, then uses tablet-scoped queries to select a tablet\nthat actually received rows;\n- the fault-injection load reuses the same input, so hash routing\nguarantees that the selected tablet participates again;\n- the debug log records both the selected tablet and destination\nbackend.\n\nNormal load behavior is unchanged when debug points are disabled. The\ndebug point has only this one in-repository caller, which is updated in\nthe same change."
    },
    {
      "commit": "e7b0e4e22d538447f3bd31ecf89c5d2cf5b16665",
      "tree": "61c59284b9a06ad617e7eca7be082766617badeb",
      "parents": [
        "c70cf12a8417b1eff336171faa4f6294dad51670"
      ],
      "author": {
        "name": "shuke",
        "email": "shuke@selectdb.com",
        "time": "Thu Jul 16 14:12:17 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 16 14:12:17 2026 +0800"
      },
      "message": "[fix](regression) Run restore auth case non-concurrently (#65668)\n\nProblem Summary:\n\nThe repository authorization regression case can fail for reasons\nunrelated to authorization when it runs concurrently with\ntest_ddl_restore_auth.\n\nIn P0 build 996774, test_ddl_restore_auth submitted BACKUP SNAPSHOT at\n19:40:11.661. The repository authorization case submitted its authorized\nDROP REPOSITORY only 26 ms later. The drop exhausted BackupHandler\u0027s\n10-second global submission-lock wait at 19:40:21.691, while the backup\nsubmission did not return until 19:40:28.960.\n\nThis change runs test_ddl_restore_auth in the nonConcurrent group,\nconsistent with test_ddl_backup_auth. The authorization assertions and\nbackup/restore behavior are unchanged."
    },
    {
      "commit": "c70cf12a8417b1eff336171faa4f6294dad51670",
      "tree": "ff3fa8d6be4c0ba5b2eb5d5b32a9039e2e630b66",
      "parents": [
        "f1a168db108fbce8bbbb157333754b432df99e5a"
      ],
      "author": {
        "name": "minghong",
        "email": "zhouminghong@selectdb.com",
        "time": "Thu Jul 16 14:11:20 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 16 14:11:20 2026 +0800"
      },
      "message": "[opt](regression) remove duplicated tpcds 64 plan shape check (#65652)\n\n### What problem does this PR solve?\n\ntpcds 64 plan is always created by dphyper. \nso keep ds64 in dephyper dir, and remove it from other dirs."
    },
    {
      "commit": "f1a168db108fbce8bbbb157333754b432df99e5a",
      "tree": "ef37e796e8ccff493630c9d2b3bf5135e2233fa8",
      "parents": [
        "163a10a3c44bf298ad964ea84e360667ab3abd99"
      ],
      "author": {
        "name": "Jerry Hu",
        "email": "hushenggang@selectdb.com",
        "time": "Thu Jul 16 13:26:37 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 16 13:26:37 2026 +0800"
      },
      "message": "[fix](be) Fix nested array_map lambda captures (#64563)\n\nProblem Summary: Nested `array_map` expressions can capture variables\nfrom an outer lambda while also defining their own lambda arguments. The\nBE previously assigned lambda argument column offsets by recursively\nmutating every `VColumnRef` under the outer lambda. This could make an\ninner lambda argument reuse the outer lambda block offset and read the\nwrong column during `INSERT ... SELECT`, causing a BE crash. This change\nserializes lambda argument metadata to BE, preserves captured input\ncolumns in lambda blocks, and resolves lambda arguments by the current\nruntime lambda scope while respecting nested-lambda shadowing.\n\n`array_sort` comparator arguments are intentionally position-based\nbecause FE may distinguish the two comparator arguments only by column\nid. A comparator that captures an outer lambda argument cannot be\nrepresented in the current two-column comparator block and could\notherwise be silently resolved as one of the comparator arguments. This\nPR now returns a clear error for unsupported captured column or slot\nreferences in `array_sort` comparators.\n\nA follow-up refactor moves `LambdaExecutionContext` into\n`be/src/exprs/lambda_function/lambda_execution_context.h`.\n`VExprContext` now keeps the lambda execution context as a private\nimplementation detail and includes the concrete definition only from\n`vexpr_context.cpp`, while lambda functions and `VColumnRef` include the\ndedicated lambda header where they need the concrete type.\n\n### Release note\n\nFix nested `array_map` lambda captures and reject unsupported captured\nreferences in `array_sort` comparators.\n\n### Check List (For Author)\n\n- Test:\n- Unit Test: `./run-be-ut.sh --run\n--filter\u003dArrayMapFunctionTest.NamedLambdaWithFewerArgumentsThanArraysUsesDeclaredBindings:ArrayMapFunctionTest.*:VColumnRefTest.*`\n- Unit Test: `./run-be-ut.sh --run\n--filter\u003dArrayMapFunctionTest.NestedArraySortComparatorCapturingOuterArgumentReturnsError:ArrayMapFunctionTest.NestedArraySortInsideArrayMapSkipsArrayMapArgumentInference`\n    - Unit Test: `./run-be-ut.sh --run --filter\u003dArrayMapFunctionTest.*`\n    - Build: `./build.sh --be --fe`\n    - Build: `./build.sh --be`\n- Regression test: `./run-regression-test.sh --run -d\nquery_p0/sql_functions/array_functions -s test_nested_array_map_insert\n-forceGenOut`\n- Regression test: `./run-regression-test.sh --run -d\nquery_p0/sql_functions/array_functions -s test_nested_array_map_insert`\n- Regression test: `./run-regression-test.sh --run -d\nquery_p0/sql_functions/array_functions -s test_nested_array_map`\n- Regression test: `./run-regression-test.sh --run -d\nquery_p0/virtual_slot_ref -s fix_array_type_and_lambda_func`\n- Regression test: `./run-regression-test.sh --run -d\nquery_p0/virtual_slot_ref`\n- Regression test: `./run-regression-test.sh --run -d\nquery_p0/sql_functions/array_functions`\n- Format check: `./build-support/clang-format.sh`;\n`./build-support/check-format.sh`; `git diff --check`\n- Static analysis: attempted `./build-support/run-clang-tidy.sh\n--build-dir be/ut_build_ASAN --base HEAD`, but local clang-tidy analysis\nwas blocked by existing/local diagnostics including missing `stddef.h`,\nunmatched `NOLINTEND` in `be/src/core/types.h`, and pre-existing\nfunction-size/cognitive-complexity diagnostics.\n- Static analysis: attempted `build-support/run-clang-tidy.sh\n--build-dir be/build_Release`, but it is still blocked by the same\nexisting/toolchain diagnostics.\n- Behavior changed: Yes. Nested lambda arguments are resolved by\nFE-provided lambda metadata when available; old nested lambda plans\nwithout metadata fail fast; unsupported captured references in\n`array_sort` comparators now return an error instead of being silently\nposition-bound.\n- Does this need documentation: No"
    },
    {
      "commit": "163a10a3c44bf298ad964ea84e360667ab3abd99",
      "tree": "f314049a77a49f11f3917c514000b42d6f062f21",
      "parents": [
        "27357c32063c0ec46f5cb1678d080a421b3aec2b"
      ],
      "author": {
        "name": "Calvin Kirs",
        "email": "guoqiang@selectdb.com",
        "time": "Thu Jul 16 12:06:28 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 16 12:06:28 2026 +0800"
      },
      "message": "[improvement](fe) add fe_meta_auth_token for FE meta-service internal HTTP auth (#65551)\n\n### What problem does this PR solve?\n\nThe FE meta-service endpoints\n(`image`/`role`/`check`/`put`/`journal_id`, etc.) authenticate callers\nonly by the `CLIENT_NODE_HOST`/`CLIENT_NODE_PORT` headers — i.e. whether\nthe claimed host:port is a registered FE. That is a plaintext claim with\nno secret, so any host that knows a valid FE address can impersonate it.\n\n### What this PR does\n\nAdd an optional cluster token `fe_meta_auth_token`:\n\n- **Empty (default)** — behavior is unchanged: node-host check only.\nExisting clusters and rolling upgrades are unaffected.\n- **Set** — `checkFromValidFe` additionally requires the request to\ncarry a matching token header, **on top of** the existing node-host\ncheck (additive, does not replace the host check).\n\nThe token is a static `fe.conf` item, so a scaling-out FE already holds\nit before the bootstrap handshake — no chicken-and-egg with the token\nthat `/check` itself hands out. It must be identical on all FEs.\n\nAdditional hardening of the meta-service:\n- `/put` rejects a port other than the FE HTTP port.\n- `/dump` always checks the admin password.\n- meta-helper logs header **names** only, never the token value.\n\n### Tests\n\n- `MetaServiceTest`: matching / missing / wrong token,\nno-token-when-unconfigured, unknown-host-rejected-even-with-token,\n`/put` port check, `/dump` auth.\n- `HttpURLUtilTest`: token header emission (present/absent) and internal\nURL building.\n\nThis supersedes #63782 (rebased onto latest master with a cleaner,\nswitch-free design)."
    },
    {
      "commit": "27357c32063c0ec46f5cb1678d080a421b3aec2b",
      "tree": "5ffc0527239edfc2f9327f28364300d20aff7299",
      "parents": [
        "a6745ebc5cbcaa2e2304653494211bf15747cee8"
      ],
      "author": {
        "name": "TengJianPing",
        "email": "tengjianping@selectdb.com",
        "time": "Thu Jul 16 12:00:48 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 16 12:00:48 2026 +0800"
      },
      "message": "[fix](be) Track block bloom filter memory (#65578)\n\n### What problem does this PR solve?\n\nIssue Number: None\n\nRelated PR: None\n\nProblem Summary: BlockBloomFilter allocated its directory with\nposix_memalign, bypassing Doris Allocator and MemTracker. Use the\ntracked allocator for aligned allocation and release, preserving the\nprevious directory size across reinitialization so accounting remains\nexact.\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test: Unit Test\n    - BloomFilterFuncTest.TrackBlockBloomFilterMemory\n- Behavior changed: Yes (Block Bloom Filter directory memory is now\nrecorded by MemTracker)\n- Does this need documentation: No\n\n### What problem does this PR solve?\n\nIssue Number: close #xxx\n\nRelated PR: #xxx\n\nProblem Summary:\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test \u003c!-- At least one of them must be included. --\u003e\n    - [ ] Regression test\n    - [ ] Unit Test\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [ ] No need to test or manual test. Explain why:\n- [ ] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [ ] Other reason \u003c!-- Add your reason?  --\u003e\n\n- Behavior changed:\n    - [ ] No.\n    - [ ] Yes. \u003c!-- Explain the behavior change --\u003e\n\n- Does this need documentation?\n    - [ ] No.\n- [ ] Yes. \u003c!-- Add document PR link here. eg:\nhttps://github.com/apache/doris-website/pull/1214 --\u003e\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label \u003c!-- Add branch pick label that this PR\nshould merge into --\u003e"
    },
    {
      "commit": "a6745ebc5cbcaa2e2304653494211bf15747cee8",
      "tree": "2553a5864548297bc4686e7b32a652c2b580db4e",
      "parents": [
        "e230e078bc6c48754a0405e3292942afad42f44e"
      ],
      "author": {
        "name": "TsukiokaKogane",
        "email": "cby141994@gmail.com",
        "time": "Thu Jul 16 11:27:04 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 16 11:27:04 2026 +0800"
      },
      "message": "[improvement] (table stream) refactor misleading name StreamTableInfo(#65636)"
    },
    {
      "commit": "e230e078bc6c48754a0405e3292942afad42f44e",
      "tree": "bd42d8e737a676afa49b3726fd9b66ef6644be07",
      "parents": [
        "b440a3f6f9408fa99c8e0fc2ec99d5a71ec09361"
      ],
      "author": {
        "name": "deardeng",
        "email": "dengxin@selectdb.com",
        "time": "Thu Jul 16 10:37:03 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 16 10:37:03 2026 +0800"
      },
      "message": "[feature](fe) Let an FE declare and broadcast its local resource group (#65611)\n\nAdds the FE-side topology metadata only. No scheduling, replica\nselection or load behavior changes.\n\nA FE can now declare which resource group (BE `tag.location`) it sits\nin, so that operators and later features can reason about cross-AZ /\ncross-IDC deployments:\n\n- `local_resource_group` FE config, overridable by the\n`--local_resource_group` start_fe.sh option or the\n`DORIS_LOCAL_RESOURCE_GROUP` environment variable. Precedence is command\nline \u003e environment \u003e fe.conf. The value must be a valid `tag.location`\nname, otherwise the FE refuses to start.\n- The value is reported through the FE heartbeat\n(TFrontendPingFrontendResult) and kept on Frontend as runtime-only\nstate, so the master can see every FE\u0027s group.\n- Exposed as a new `LocalResourceGroup` column, appended at the end of\n`SHOW FRONTENDS` and `frontends()` so existing column positions do not\nmove.\n\nBecause the command line value has to override fe.conf, it can only be\nresolved once Config is loaded. DorisFE.parseArgs() is therefore split\ninto parsing (parseArgs) and mode selection (buildCommandLineOptions).\nBoth halves are still called at exactly the point where parseArgs() used\nto be, so the version / helper / image modes keep returning before the\nrecovery and drop-backends system properties are applied, and those\nflags remain ignored in those modes as before. Only\nresolveLocalResourceGroup() runs after Config.init()."
    },
    {
      "commit": "b440a3f6f9408fa99c8e0fc2ec99d5a71ec09361",
      "tree": "0aea839ad9bd5bfd76c943ed800527e3fcde296f",
      "parents": [
        "c3e2004ebc9eb23c5477e59c9398613d05503fab"
      ],
      "author": {
        "name": "Gabriel",
        "email": "liwenqiang@selectdb.com",
        "time": "Thu Jul 16 10:14:53 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 16 10:14:53 2026 +0800"
      },
      "message": "[fix](be) Harden FileScannerV2 filtering and profiling (#65624)\n\nFileScannerV2 had several correctness,\nfallback-contract, lazy-materialization, profiling, and repeated\nschema-work gaps. Scanner slot ids could be mistaken for table-global\nordinals, invalid Parquet dictionary ids could silently reject rows,\ndictionary selection could request a clean fallback after advancing the\nstream, filters that could not be localized still forced eager reads,\nand every split counted all row groups in the file. Equality-delete\nloading also rebuilt its schema-shaped block for each batch."
    },
    {
      "commit": "c3e2004ebc9eb23c5477e59c9398613d05503fab",
      "tree": "d9dcfb2547cb21a3b8423bde760dcf342c40cfb1",
      "parents": [
        "307bf9e333f349726c0d00cfe82c446138809bf2"
      ],
      "author": {
        "name": "daidai",
        "email": "changyuwei@selectdb.com",
        "time": "Thu Jul 16 10:11:58 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jul 16 10:11:58 2026 +0800"
      },
      "message": "[fix](iceberg) Guard synthesized slots in _create_column_ids (#65610)\n\n### What problem does this PR solve?\nRelated PR: #65502\n\nProblem Summary:\nIceberg Parquet/ORC scans resolve each projected column by name through\nthe StructNode in _create_column_ids. Synthesized/metadata columns — the\nTopN global row-id and the $row_id column — are never serialized into\nthe schema tree, so the node has no entry for them. Calling\nchildren_column_exists() on such an unregistered name hits\nDCHECK(children.contains(name)) and aborts in debug/ASAN builds (and\nthrows std::out_of_range from .at() in release builds), crashing during\nreader init on a TopN projection over an Iceberg table.\n\nGuard the lookup with\nstruct_node-\u003eget_children().contains(slot-\u003ecol_name()) before querying\nthe child (reusing the pattern already used by the equality-delete\nexpand-column path in the same file). Synthesized slots are skipped\ninstead of aborting. Name-based resolution and its\npartial-id/name-mapping correctness are unchanged, and the shared\nchildren_column_exists() helper is left untouched. Applied to both the\nParquet and ORC _create_column_ids.\n\n\nMade enable_file_scanner_v2 a fuzzy , so external regression runs\nrandomly exercise both the V1 and V2 scan paths — exactly the two paths\nthis fix spans.\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test \u003c!-- At least one of them must be included. --\u003e\n    - [ ] Regression test\n    - [x] Unit Test\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [ ] No need to test or manual test. Explain why:\n- [ ] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [ ] Other reason \u003c!-- Add your reason?  --\u003e\n\n- Behavior changed:\n    - [x] No.\n    - [ ] Yes. \u003c!-- Explain the behavior change --\u003e\n\n- Does this need documentation?\n    - [x] No.\n- [ ] Yes. \u003c!-- Add document PR link here. eg:\nhttps://github.com/apache/doris-website/pull/1214 --\u003e\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label \u003c!-- Add branch pick label that this PR\nshould merge into --\u003e"
    }
  ],
  "next": "307bf9e333f349726c0d00cfe82c446138809bf2"
}