)]}'
{
  "log": [
    {
      "commit": "596862b7a0e6714f81634c685fa5ba6c740b00b8",
      "tree": "9e89e00c111ea689e2aabcbda14ce13da3509553",
      "parents": [
        "1c1c66fecfee3842ea70db9f7264c9cff81f03bd"
      ],
      "author": {
        "name": "meiyi",
        "email": "meiyi@selectdb.com",
        "time": "Tue Jun 02 19:21:56 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 19:21:56 2026 +0800"
      },
      "message": "[fix](fe) cache version and get tablet stats actively for RestoreJob (#62704)"
    },
    {
      "commit": "1c1c66fecfee3842ea70db9f7264c9cff81f03bd",
      "tree": "f1f9d14c0cfa90845523a7cb5ad32fc8bc9a8e84",
      "parents": [
        "80158100b3259a80780f41d51fd1657aeb5f5d33"
      ],
      "author": {
        "name": "Gabriel",
        "email": "liwenqiang@selectdb.com",
        "time": "Tue Jun 02 18:52:16 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 18:52:16 2026 +0800"
      },
      "message": "[improvement](fe) Add external table metadata profile details (#63648)\n\nProblem Summary: External table queries previously showed only coarse FE\nmetadata time in profile, making it hard to locate slow metadata access\nsteps for Hive, Iceberg, Hudi, and Paimon scans. This change records\ndedicated profile timings for external partition value loading,\npartition metadata loading, partition file listing, and file scan task\nplanning. The scan nodes keep a SummaryProfile reference so asynchronous\nsplit planning can also report its metadata time."
    },
    {
      "commit": "80158100b3259a80780f41d51fd1657aeb5f5d33",
      "tree": "37d969dee04b0c94e2515b806e28697c07616016",
      "parents": [
        "7ec8f7d0d61d1ab47353ce095ac5b879a28fb15e"
      ],
      "author": {
        "name": "daidai",
        "email": "changyuwei@selectdb.com",
        "time": "Tue Jun 02 18:18:10 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 18:18:10 2026 +0800"
      },
      "message": "[fix](iceberg) Reject iceberg COW table row-level DML (#63950)\n\n### What problem does this PR solve?\nProblem Summary:\n1. reject iceberg cow table update/delete/merge DML(current not support)\n2. set doris create iceberg table default mode : mor\n\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test \u003c!-- At least one of them must be included. --\u003e\n    - [ ] Regression test\n    - [ ] Unit Test\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [ ] No need to test or manual test. Explain why:\n- [ ] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [ ] Other reason \u003c!-- Add your reason?  --\u003e\n\n- Behavior changed:\n    - [ ] No.\n    - [ ] Yes. \u003c!-- Explain the behavior change --\u003e\n\n- Does this need documentation?\n    - [ ] No.\n- [ ] Yes. \u003c!-- Add document PR link here. eg:\nhttps://github.com/apache/doris-website/pull/1214 --\u003e\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label \u003c!-- Add branch pick label that this PR\nshould merge into --\u003e"
    },
    {
      "commit": "7ec8f7d0d61d1ab47353ce095ac5b879a28fb15e",
      "tree": "4dbbdbabf0d28291d768893bb6c32a632615b6e2",
      "parents": [
        "4db157e16fdbea6dbfc44baabf77c7e9d590a709"
      ],
      "author": {
        "name": "zhengyu",
        "email": "zhangzhengyu@selectdb.com",
        "time": "Tue Jun 02 18:03:22 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 18:03:22 2026 +0800"
      },
      "message": "[fix](filecache) exclude warmup reads from file cache hit ratio metrics (#63394)\n\nFile cache hit ratio metrics are derived from global\nfile cache read bytes, but warmup reads from manual warmup, periodic\nwarmup, event-driven warmup, and rebalance-triggered warmup used to\nupdate the same counters as query reads. This polluted the query hit\nratio. Mixed hit/miss reads could also be attributed to one source for\nthe whole request. This change skips warmup updates to global file cache\nread metrics while preserving per-IOContext profile stats, records\nlocal/remote/peer bytes by actual returned bytes, and avoids updating\nmetrics for failed reads. It also fixes direct-read partial continuation\nand no-warmup miss-only hit ratio refresh."
    },
    {
      "commit": "4db157e16fdbea6dbfc44baabf77c7e9d590a709",
      "tree": "3d78694452b213f7d7365a875a034dbffe5d560e",
      "parents": [
        "5b3b20c2f8bcea8cc03d0a3472336e112a90dd31"
      ],
      "author": {
        "name": "Gabriel",
        "email": "liwenqiang@selectdb.com",
        "time": "Tue Jun 02 17:55:40 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 17:55:40 2026 +0800"
      },
      "message": "[fix](fe) Support dollar sign in mysql pattern (#63972)\n\nProblem Summary: Querying information_schema.columns with an external\nsystem table name such as table can call\nFrontendServiceImpl.getTableNames with the table name as a pattern. The\nMySQL pattern converter rejected \u0027$\u0027 as a forbidden regex character, so\nthe FE thrift service threw an internal getTableNames error before\nmetadata resolution. This change treats \u0027$\u0027 as a literal character in\nMySQL patterns by escaping it for the generated Java regex, allowing\nsystem table names to be matched safely.\n\n### Release note\n\nFixes metadata lookup for table names containing \u0027$\u0027, including external\nsystem tables such as table."
    },
    {
      "commit": "5b3b20c2f8bcea8cc03d0a3472336e112a90dd31",
      "tree": "2d9803ad1d5c57a68fb8a475162d6763f7b78cc6",
      "parents": [
        "b1112e5dbdc628bffa9019fe92d6f843c94b7b3e"
      ],
      "author": {
        "name": "starocean999",
        "email": "lichi@selectdb.com",
        "time": "Tue Jun 02 17:48:27 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 17:48:27 2026 +0800"
      },
      "message": "[fix](asof_join)PhysicalHashJoin\u0027s computeUniform method should process asof join properly (#62730)\n\n### What problem does this PR solve?\nExtend PhysicalHashJoin to correctly handle ASOF join variants so trait\npropagation, equal-set extraction, and functional dependency (FD)\ncalculations work for ASOF join"
    },
    {
      "commit": "b1112e5dbdc628bffa9019fe92d6f843c94b7b3e",
      "tree": "b1d44409859f8c782c59558440a50f13c87ac73b",
      "parents": [
        "07b497ab2192a8eb3ccfe1e0c41a48123c646e1e"
      ],
      "author": {
        "name": "Pxl",
        "email": "xl@selectdb.com",
        "time": "Tue Jun 02 17:04:36 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 17:04:36 2026 +0800"
      },
      "message": "[improvement](fe) Avoid two-phase agg for single instance (#63732)\n\nIn a single-BE single-instance execution, non-distinct\naggregation does not benefit from splitting into local and global\nphases. The split can add an unnecessary hash exchange and extra\naggregate operator for high-cardinality group-by queries. This change\ndetects the single execution instance case during Nereids non-distinct\naggregate implementation and only generates the one-phase aggregate\ncandidate. It also lets the global aggregate request ANY child\ndistribution in that case so the optimizer does not add a redundant\nexchange.\n\n### Release note\n\nOptimize aggregation planning for single-BE single-instance execution by\navoiding unnecessary local/global aggregate split."
    },
    {
      "commit": "07b497ab2192a8eb3ccfe1e0c41a48123c646e1e",
      "tree": "476bb07ff23070d891bb24000dbe2bd25330c4cf",
      "parents": [
        "cddb80e84872517dade5e3420bdc16bf16381224"
      ],
      "author": {
        "name": "TengJianPing",
        "email": "tengjianping@selectdb.com",
        "time": "Tue Jun 02 16:22:25 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 16:22:25 2026 +0800"
      },
      "message": "[fix](be) Avoid UB from unaligned __int128 dereference (#63703)\n\nIssue Number: close #xxx\n\nProblem Summary: Several BE call sites obtained a byte pointer from\nStringRef::data / Slice::data / a generic const void* (e.g. ORC pushdown\nliteral value, JSONB serde, runtime filter literal builder, meta_tool\ncolumn dump) and dereferenced it as `__int128*` / `int128_t*` /\n`DecimalV2Value*` / `Decimal\u003cint128_t\u003e*`.\n\nBecause those buffers carry no 16-byte alignment guarantee, the load is\nundefined behavior. On alignment-strict targets (some aarch64 / SPARC\nbuilds) and under UBSan -fsanitize\u003dalignment the read can SIGBUS, abort,\nor - with SSE codegen for __int128 - fault on a movdqa instruction.\n\nSites fixed:\n- be/src/core/data_type_serde/data_type_number_serde.cpp (LARGEINT\nJSONB)\n- be/src/format/orc/vorc_reader.cpp (TYPE_DECIMALV2 / TYPE_DECIMAL128I\nliteral conversion for ORC predicate push-down)\n- be/src/tools/meta_tool.cpp (LARGEINT and DECIMAL128I dump)\n- be/src/exprs/vexpr.h create_texpr_literal_node\u003c\u003e: TYPE_LARGEINT,\nTYPE_DECIMALV2 and TYPE_DECIMAL128I literal construction\n\nAll these sites now load the 16-byte value through the\n`unaligned_load\u003cT\u003e` helper from `util/unaligned.h` into a local __int128\n/ DecimalV2Value / Decimal\u003cint128_t\u003e before use. Modern compilers reduce\nthe helper\u0027s memcpy to a load, so there is no measurable performance\nimpact, but the semantics become well-defined regardless of the\nproducer\u0027s alignment.\n\nNote: be/src/runtime/fold_constant_executor.cpp also contains unaligned\n__int128 reads for TYPE_LARGEINT and TYPE_DECIMALV2 in `_get_result`,\nbut that branch is unreachable under the current Nereids planner (which\nalways sets `is_nereids \u003d true` and uses `be_exec_version \u003e\u003d 4`, taking\nthe protobuf serde path). It is left untouched here to keep the diff\nfocused; the dead branch can be cleaned up separately.\n\n### What problem does this PR solve?\n\nIssue Number: close #xxx\n\nRelated PR: #xxx\n\nProblem Summary:\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test \u003c!-- At least one of them must be included. --\u003e\n    - [ ] Regression test\n    - [ ] Unit Test\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [ ] No need to test or manual test. Explain why:\n- [ ] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [ ] Other reason \u003c!-- Add your reason?  --\u003e\n\n- Behavior changed:\n    - [ ] No.\n    - [ ] Yes. \u003c!-- Explain the behavior change --\u003e\n\n- Does this need documentation?\n    - [ ] No.\n- [ ] Yes. \u003c!-- Add document PR link here. eg:\nhttps://github.com/apache/doris-website/pull/1214 --\u003e\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label \u003c!-- Add branch pick label that this PR\nshould merge into --\u003e"
    },
    {
      "commit": "cddb80e84872517dade5e3420bdc16bf16381224",
      "tree": "ac282ff29d8b60efb9af7f61b30db13a1448a83e",
      "parents": [
        "aa9162840f154159943a444ac222d70d69a7a0c2"
      ],
      "author": {
        "name": "TengJianPing",
        "email": "tengjianping@selectdb.com",
        "time": "Tue Jun 02 16:17:06 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 16:17:06 2026 +0800"
      },
      "message": "[fix](be) Fix DCHECK in LocalExchangeSharedState::sub_total_mem_usage (#63742)\n\nProblem Summary: In LocalExchangeSharedState::sub_total_mem_usage(),\n`mem_usage` is `std::atomic\u003cint64_t\u003e` but `delta` is `size_t`. The\nexisting debug check\n\n    DCHECK_GE(prev_usage - delta, 0);\n\nwas never effective: the usual arithmetic conversions promote\n`prev_usage - delta` to `size_t`, and an unsigned expression is\ntrivially `\u003e\u003d 0`. So the guard against `mem_usage` underflow\n(subtracting more than was added) silently passed in all debug builds,\nleaving any over-subtraction undetected.\n\nFix: compare `prev_usage` (int64_t) against `cast_set\u003cint64_t\u003e(delta)`\nso the comparison is performed entirely in signed space, and a real\nunderflow will actually trip the DCHECK with the original prev_usage and\ndelta values in the failure message. The release-mode guard on the next\nline (`cast_set\u003cint64_t\u003e(prev_usage - delta)` throws on underflow\nbecause the wrapped size_t result exceeds INT64_MAX) is preserved as-is."
    },
    {
      "commit": "aa9162840f154159943a444ac222d70d69a7a0c2",
      "tree": "8a1437de1737dc10320aaccf96a1911ce1c08334",
      "parents": [
        "74d5c5b460863162c3ac65bffcac785d363ad17c"
      ],
      "author": {
        "name": "daidai",
        "email": "changyuwei@selectdb.com",
        "time": "Tue Jun 02 14:18:45 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 14:18:45 2026 +0800"
      },
      "message": "[fix](iceberg) Add missing Iceberg field IDs for position delete files. (#63483)\n\n### What problem does this PR solve?\n\nIssue Number: close #xxx\n\nRelated PR: #xxx\n\nProblem Summary:\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test \u003c!-- At least one of them must be included. --\u003e\n    - [ ] Regression test\n    - [ ] Unit Test\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [ ] No need to test or manual test. Explain why:\n- [ ] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [ ] Other reason \u003c!-- Add your reason?  --\u003e\n\n- Behavior changed:\n    - [ ] No.\n    - [ ] Yes. \u003c!-- Explain the behavior change --\u003e\n\n- Does this need documentation?\n    - [ ] No.\n- [ ] Yes. \u003c!-- Add document PR link here. eg:\nhttps://github.com/apache/doris-website/pull/1214 --\u003e\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label \u003c!-- Add branch pick label that this PR\nshould merge into --\u003e"
    },
    {
      "commit": "74d5c5b460863162c3ac65bffcac785d363ad17c",
      "tree": "b4daa09eb7f859c0c99bc9a0abce0b116593294d",
      "parents": [
        "a0a09b0eac406d143a57752ff06430e2dd14d4e0"
      ],
      "author": {
        "name": "Pxl",
        "email": "xl@selectdb.com",
        "time": "Tue Jun 02 14:15:49 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 14:15:49 2026 +0800"
      },
      "message": "[Feature](scan) support runtime partition prune (#62589)\n\n```sql\n\n-- \u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\n-- PART A — RANGE-partitioned fact table\n--   4096 partitions, ~50M rows\n-- \u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\n\nDROP TABLE IF EXISTS rf_perf_fact;\nCREATE TABLE rf_perf_fact (\n    id        BIGINT       NOT NULL,\n    part_col  INT          NOT NULL,\n    payload   VARCHAR(64)  NOT NULL\n)\nDUPLICATE KEY(id)\nPARTITION BY RANGE(part_col) (\n    -- 4096 buckets of width 1024 covering [0, 4096*1024)\n    FROM (0) TO (4194304) INTERVAL 1024\n)\nDISTRIBUTED BY HASH(id) BUCKETS 4\nPROPERTIES (\"replication_num\" \u003d \"1\");\n\n-- 50M rows; part_col uniformly spread across the 4096 partitions\nINSERT INTO rf_perf_fact\nSELECT\n    number                              AS id,\n    CAST(number % 4194304 AS INT)       AS part_col,\n    CONCAT(\u0027p\u0027, CAST(number AS STRING)) AS payload\nFROM numbers(\"number\" \u003d \"50000000\");\n\nDROP TABLE IF EXISTS rf_perf_dim;\nCREATE TABLE rf_perf_dim (\n    dim_key  INT NOT NULL\n)\nDUPLICATE KEY(dim_key)\nDISTRIBUTED BY HASH(dim_key) BUCKETS 1\nPROPERTIES (\"replication_num\" \u003d \"1\");\n\n-- 5 keys, all in [0, 1024)  \u003d\u003e all hit partition #0\nINSERT INTO rf_perf_dim VALUES (1), (7), (42), (100), (999);\n\nANALYZE TABLE rf_perf_fact WITH SYNC;\nANALYZE TABLE rf_perf_dim  WITH SYNC;\n\n-- \u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\n-- Baseline: feature OFF — BE must scan every partition and filter row by row\n-- \u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\nSET enable_runtime_filter_partition_prune \u003d false;\n\n-- warm up file cache / page cache\nSELECT COUNT(*) FROM rf_perf_fact f JOIN rf_perf_dim d ON f.part_col \u003d d.dim_key;\n\nSELECT /*+ SET_VAR(runtime_filter_type\u003d\u0027IN_OR_BLOOM_FILTER\u0027) */\n       \u0027rf_prune_OFF\u0027 AS tag,\n       COUNT(*)       AS matched_rows\nFROM rf_perf_fact f\nJOIN rf_perf_dim  d ON f.part_col \u003d d.dim_key;\n\n-- \u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\n-- Optimized: feature ON — BE drops 1022 of 1024 partitions before scanning\n-- \u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\nSET enable_runtime_filter_partition_prune \u003d true;\n\n-- warm up under the new path too\nSELECT COUNT(*) FROM rf_perf_fact f JOIN rf_perf_dim d ON f.part_col \u003d d.dim_key;\n\nSELECT /*+ SET_VAR(runtime_filter_type\u003d\u0027IN_OR_BLOOM_FILTER\u0027) */\n       \u0027rf_prune_ON\u0027 AS tag,\n       COUNT(*)      AS matched_rows\nFROM rf_perf_fact f\nJOIN rf_perf_dim  d ON f.part_col \u003d d.dim_key;\n```\nbefore:\n```\n- ExecTime: 468.760ms\n- NumScanners: 2.048K (2048)\n- TabletNum: 2.048K (2048)\n```\nafter:\n```\n- ExecTime: 43.52ms\n- TabletsPrunedByRuntimeFilter: 2.047K (2047)\n- NumScanners: 1\n```\n\n```\nFragment RPC Phase1 Time: 247ms -\u003e 335ms\nWait and Fetch Result Time: 541ms -\u003e35ms\nTotal: 868ms -\u003e 450ms\n```\n\nThis pull request introduces runtime filter partition pruning to the\nOLAP scan operator, enabling more efficient query execution by skipping\nunnecessary partitions earlier in the scan process. The main changes\ninclude adding partition boundary parsing and sharing, integrating\nruntime filter partition pruning logic into scan initialization and\nruntime filter updates, and adding related profiling counters for\nobservability.\n\n**Runtime filter partition pruning integration:**\n\n* Added logic to parse partition boundaries once per fragment in\n`OlapScanOperatorX::prepare`, storing the result in a shared\n`ParsedPartitionBoundaries` object for all scan instances.\n(`be/src/exec/operator/olap_scan_operator.cpp`,\n`be/src/exec/operator/scan_operator.h`,\n`be/src/exec/operator/scan_operator.cpp`,\n[[1]](diffhunk://#diff-3ddc75656071d9c0e6b0be450e152a1c94559f7e70ea820e7f0c80a7078e3292R1242-R1268)\n[[2]](diffhunk://#diff-a45108106b12759815ac5991d56a9d6ead7ffb9c915fb0cf7b6f0b8123b30262R384-R393)\n[[3]](diffhunk://#diff-a45108106b12759815ac5991d56a9d6ead7ffb9c915fb0cf7b6f0b8123b30262R471-R476)\n* Integrated partition pruning checks into the scan initialization and\nruntime filter update flows, ensuring that tablets and partitions pruned\nby runtime filters are skipped before scanner construction and scan IO.\n(`be/src/exec/operator/olap_scan_operator.cpp`,\n`be/src/exec/operator/scan_operator.cpp`,\n[[1]](diffhunk://#diff-3ddc75656071d9c0e6b0be450e152a1c94559f7e70ea820e7f0c80a7078e3292R605-R648)\n[[2]](diffhunk://#diff-3c4794a864169735a628d8fcb1de986523a9516aad318aa0a893913f4ce031d4R106-R132)\n[[3]](diffhunk://#diff-a45108106b12759815ac5991d56a9d6ead7ffb9c915fb0cf7b6f0b8123b30262L284-R304)\n\n**Profiling and observability:**\n\n* Added new runtime profile counters to track the number of partitions\nand tablets pruned by runtime filters, as well as the total number of\npartitions considered for pruning.\n(`be/src/exec/operator/olap_scan_operator.cpp`,\n`be/src/exec/operator/scan_operator.cpp`,\n[[1]](diffhunk://#diff-3ddc75656071d9c0e6b0be450e152a1c94559f7e70ea820e7f0c80a7078e3292R130-R131)\n[[2]](diffhunk://#diff-3c4794a864169735a628d8fcb1de986523a9516aad318aa0a893913f4ce031d4R1123-R1127)\n\n**Code organization and extensibility:**\n\n* Refactored scan operator and local state classes to support\nscan-agnostic runtime filter partition pruning, allowing future scan\ntypes to reuse the new pruning logic.\n(`be/src/exec/operator/scan_operator.h`,\n[[1]](diffhunk://#diff-a45108106b12759815ac5991d56a9d6ead7ffb9c915fb0cf7b6f0b8123b30262R89-R92)\n[[2]](diffhunk://#diff-a45108106b12759815ac5991d56a9d6ead7ffb9c915fb0cf7b6f0b8123b30262R107-R113)\n[[3]](diffhunk://#diff-a45108106b12759815ac5991d56a9d6ead7ffb9c915fb0cf7b6f0b8123b30262R149-R153)\n[[4]](diffhunk://#diff-a45108106b12759815ac5991d56a9d6ead7ffb9c915fb0cf7b6f0b8123b30262R384-R393)\n[[5]](diffhunk://#diff-a45108106b12759815ac5991d56a9d6ead7ffb9c915fb0cf7b6f0b8123b30262R471-R476)\n\n**Supporting changes:**\n\n* Added necessary includes and forward declarations for new data\nstructures and utilities.\n(`be/src/exec/operator/olap_scan_operator.cpp`,\n`be/src/exec/operator/olap_scan_operator.h`,\n`be/src/exec/operator/operator.h`,\n[[1]](diffhunk://#diff-3ddc75656071d9c0e6b0be450e152a1c94559f7e70ea820e7f0c80a7078e3292R25)\n[[2]](diffhunk://#diff-3ddc75656071d9c0e6b0be450e152a1c94559f7e70ea820e7f0c80a7078e3292R37)\n[[3]](diffhunk://#diff-235cc51f4698ebf0ebe796e7253912907bb7c1caf4efcac92538ac1ff2eac171R22-R31)\n[[4]](diffhunk://#diff-a45108106b12759815ac5991d56a9d6ead7ffb9c915fb0cf7b6f0b8123b30262R31)\n[[5]](diffhunk://#diff-43818c734eb80e2eab6dbaa015b0f8b9bc7509bf3838e01162c6737f60a83809R58)\n\nThese changes collectively enable more efficient query execution by\navoiding unnecessary work on partitions that can be pruned by runtime\nfilters, and lay the groundwork for further improvements in partition\npruning across scan types.\n\n---------\n\nCo-authored-by: Copilot \u003c223556219+Copilot@users.noreply.github.com\u003e"
    },
    {
      "commit": "a0a09b0eac406d143a57752ff06430e2dd14d4e0",
      "tree": "20683d9a7934b0d89a68639b7854000b42f53cee",
      "parents": [
        "61ca8bd9bc6d7513a62b69c9c6b7e89e22d175be"
      ],
      "author": {
        "name": "Mryange",
        "email": "yanxuecheng@selectdb.com",
        "time": "Tue Jun 02 13:50:32 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 13:50:32 2026 +0800"
      },
      "message": "[refine](function) use concrete column pointers for local result columns (#63938)\n\n### What problem does this PR solve?\n\n\nSome BE expression and storage code creates a concrete column type and\nthen immediately casts the generic `ColumnPtr` or `MutableColumnPtr`\nback to the same concrete type before writing data. This adds\nunnecessary casts and makes the ownership intent less direct. Root\ncause: several local result columns were declared as generic column\npointers even though the concrete column type was already known at\ncreation time.\n\nThis PR refines those local variables to keep concrete column pointers\nwhere the type is explicit, and directly accesses the concrete column\ndata. It also updates the explode-numbers table function member to use a\nconcrete column pointer. The change is limited to local refactoring and\ndoes not change runtime behavior.\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test \u003c!-- At least one of them must be included. --\u003e\n    - [ ] Regression test\n    - [ ] Unit Test\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [ ] No need to test or manual test. Explain why:\n- [ ] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [ ] Other reason \u003c!-- Add your reason?  --\u003e\n\n- Behavior changed:\n    - [ ] No.\n    - [ ] Yes. \u003c!-- Explain the behavior change --\u003e\n\n- Does this need documentation?\n    - [ ] No.\n- [ ] Yes. \u003c!-- Add document PR link here. eg:\nhttps://github.com/apache/doris-website/pull/1214 --\u003e\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label \u003c!-- Add branch pick label that this PR\nshould merge into --\u003e"
    },
    {
      "commit": "61ca8bd9bc6d7513a62b69c9c6b7e89e22d175be",
      "tree": "6838319dcd100b9aefe7291c929cf3457cea6ffe",
      "parents": [
        "6f9ab8e88c6250db2cd075ac00e2e23d4037a24e"
      ],
      "author": {
        "name": "zhangstar333",
        "email": "zhangsida@selectdb.com",
        "time": "Tue Jun 02 13:01:12 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 13:01:12 2026 +0800"
      },
      "message": "[refact](udf) remove the udf cache expiration_time ‌property‌ (#63897)\n\n### What problem does this PR solve?\nProblem Summary:\ndoc https://github.com/apache/doris-website/pull/3845\n```\nCREATE FUNCTION print_12() RETURNS int \nPROPERTIES (\n    \"file\" \u003d \"file:///path/to/java-udf-demo-jar-with-dependencies.jar\",\n    \"symbol\" \u003d \"org.apache.doris.udf.Print\", \n    \"always_nullable\"\u003d\"true\",\n    \"type\" \u003d \"JAVA_UDF\",\n    \"static_load\" \u003d \"true\", // default value is false\n    \"expiration_time\" \u003d \"60\" // default value is 360 minutes\n);\n```\n\n```\nbefore in the java-udf could use  static_load and expiration_time to control the cache jar times in BE.\nwhich use a backgroud thread to scan the jars every ten minutes, check it\u0027s init times, and then drop it if time expire.\nthose will cause some long running query failed when the backgroud thread remove it.\nNow, remove the expiration_time, and the jar will be clean when drop fucntion immediately\n\n```"
    },
    {
      "commit": "6f9ab8e88c6250db2cd075ac00e2e23d4037a24e",
      "tree": "f8801a538860e8ad38943153eb3af04c99582013",
      "parents": [
        "6e900dd2310b2f726b66bc7d6d0e63f572dd3e9d"
      ],
      "author": {
        "name": "Userwhite",
        "email": "49226823+Userwhite@users.noreply.github.com",
        "time": "Tue Jun 02 11:42:45 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 11:42:45 2026 +0800"
      },
      "message": "[Feature] support binlog replica schedule and compaction (row type) (3/3) (#63643)\n\n### What problem does this PR solve?\n\nIssue Number: close https://github.com/apache/doris/issues/61956\n\n #### Support Row Binlog Clone and Compaction\n\n  ##### Motivation\n\n  This PR adds BE storage support:\n1. row binlog clone ( fix missing version problem)\n2. row binlog compaction. (prevent too many small files)\n\n  ##### Clone\n\n  clone binlog files like ccr\n\n  ##### Compaction\n```\n    // Binlog compaction selection rules (tiered, L0..LMax)\n    //\n    // Score / Permits\n    // - For LMax, treat Base([0-x]) as score/permit\u003d1, others use RowsetMeta::get_compaction_score().\n    //\n    // Trigger (all levels): merge when ANY holds\n    // - size \u003e\u003d binlog_compaction_goal_size_mbytes * 1MB\n    // - score \u003e\u003d binlog_compaction_file_count_threshold\n    // - time \u003e\u003d binlog_compaction_time_threshold_seconds\n    //\n    // LMax \"Base + `ENOUGH` + remaining\" model (oldest -\u003e newest)\n    //   | Base([0-x]) | `ENOUGH` rowsets | remaining rowsets ... |\n    // `ENOUGH` is computed dynamically on LMax (not persisted):\n    //      (rowset_size \u003e\u003d goal_size) OR (rowset_score \u003e\u003d file_count_threshold)\n    //\n    // Input Rowsets selection:\n    // - If physical rewrite trigger is NOT met: try quick compact first (requires Base([0-x])).\n    // - If both quick compact and physical rewrite are possible: compare score and pick the higher.\n    //\n    // Quick compact output must be OVERLAPPING.\n```\n\n  --------\n\n  ## Summary\n\n  This PR makes row binlog a first-class storage object in BE.\n\n  It adds:\n\n  1. ROW_BINLOG copy type for clone/snapshot.\n  2. Snapshot support for row binlog rowsets and binlog_delvec.\n  4. Clone support for downloading and linking row binlog files.\n  5. Tablet meta support for managing row_binlog_rs_metas.\n  6. Dedicated tiered compaction for row binlog rowsets.\n\n  The result is:\n\n  • row binlog can be correctly cloned between replicas\n  • MOW row binlog delete information is preserved\n  • row binlog file/meta count can be reduced by compaction"
    },
    {
      "commit": "6e900dd2310b2f726b66bc7d6d0e63f572dd3e9d",
      "tree": "4b1999094bd675def2fa26480f6fb9b2e7b5694a",
      "parents": [
        "b640914bf36ad6e913d71d77eb499d001d56789f"
      ],
      "author": {
        "name": "Raiden",
        "email": "zhanggen.Jung@gmail.com",
        "time": "Tue Jun 02 11:32:14 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 11:32:14 2026 +0800"
      },
      "message": "[improvement](fe) Support LDAP default roles (#63411)\n\n### What problem does this PR solve?\n\nProblem Summary:\n\nLDAP temporary users could only receive roles mapped from LDAP groups\nand the built-in information_schema-only role. This PR adds\n`ldap_default_roles` so every LDAP-authenticated user can receive\nconfigured Doris roles while still keeping LDAP group roles.\n\n### Release note\n\nSupport configuring default Doris roles for LDAP-authenticated users\nthrough `ldap_default_roles`."
    },
    {
      "commit": "b640914bf36ad6e913d71d77eb499d001d56789f",
      "tree": "2867b4f261740c212e3364abbf6620cd442ea4c9",
      "parents": [
        "7c9c366718206f7651213e97a34cb770aec51881"
      ],
      "author": {
        "name": "hui lai",
        "email": "laihui@selectdb.com",
        "time": "Tue Jun 02 11:14:17 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 11:14:17 2026 +0800"
      },
      "message": "[opt](memory) release packed file writer buffer after flush (#63967)\n\n### What problem does this PR solve?\n\nPackedFileWriter buffers data for files smaller than\nsmall_file_threshold_bytes before deciding whether to pack them into a\npacked file or switch to direct write. The buffered data is stored in a\nstd::string. After the buffered data is flushed to the inner writer or\nsubmitted to PackedFileManager, the old code only called clear(), which\nresets size but keeps capacity. When segment file writers are still\nretained by upper-level rowset structures after close, this retained\ncapacity can keep a large amount of memory alive and show up under\nPackedFileWriter::appendv in memory profiling:\n\u003cimg width\u003d\"800\" height\u003d\"1180\" alt\u003d\"image\"\nsrc\u003d\"https://github.com/user-attachments/assets/7e0e2c40-c35b-4bfc-b45b-aeed31c29771\"\n/\u003e\n\n\nThis change reserves the final append size before buffering to reduce\nrepeated std::string growth, and releases the buffer capacity after the\ndata has been flushed or submitted."
    },
    {
      "commit": "7c9c366718206f7651213e97a34cb770aec51881",
      "tree": "c1aa7da16fdfbd57962f3c5f3be1e0bbf44caa1b",
      "parents": [
        "d073c953e8a44abfadbcde8b16cc8d1903c42fd7"
      ],
      "author": {
        "name": "bobhan1",
        "email": "baohan@selectdb.com",
        "time": "Tue Jun 02 11:08:10 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 11:08:10 2026 +0800"
      },
      "message": "[fix](cloud) Normalize SC rowset graph before delete bitmap capture (#63960)\n\n## Proposed changes\n\nThis PR fixes the remaining MOW schema-change delete-bitmap path after\n#62256.\n\n#62256, whose master commit is\n`dd59f479af5a855401e3f862c751e8416070a1e2`, fixed the final\nschema-change commit path by deleting local rowsets in `[2,\nalter_version]` before adding the schema-change output rowsets to the\nreal new tablet. That keeps the committed tablet rowset graph aligned\nwith the Meta Service result.\n\nHowever, the delete-bitmap recompute path still builds and uses a\ntemporary tablet in `CloudSchemaChangeJob::_process_delete_bitmap()`.\nThat temporary tablet is initialized with the schema-change output\nrowsets, but after each `sync_tablet_rowsets(tmp_tablet)` it can again\ncontain non-schema-change local rowsets in `[2, alter_version]`, such as\ndouble-write rowsets or compaction output rowsets.\n\nIf the temporary tablet graph contains both:\n\n- schema-change output rowsets, for example `[2]`, `[3]`, ...\n- a wider local/compaction rowset, for example `[2-3]`\n\nthen `capture_consistent_rowsets()` can choose the wider\nnon-schema-change rowset from the temporary graph instead of the\nschema-change output rowsets. The delete bitmap is then recomputed\nagainst a rowset path that is not the one finally committed for the\nschema-changed tablet. A later MOW compaction may observe delete-bitmap\ncoverage inconsistent with the visible rowset graph and fail\nrow-count/delete-bitmap correctness checks.\n\nThe fix is to normalize the temporary tablet rowset graph immediately\nafter every `sync_tablet_rowsets(tmp_tablet)` and before capturing\nrowsets for delete-bitmap recomputation.\n\nConcretely this PR:\n\n- extracts `CloudTablet::replace_rowsets_with_schema_change_output()`;\n- removes non-schema-change local rowsets in `[2, alter_version]` from\nboth `_rs_version_map` and the version graph before adding schema-change\noutput rowsets;\n- reuses the helper in the real schema-change commit path;\n- calls the same helper after both tmp-tablet syncs in\n`_process_delete_bitmap()`;\n- keeps cache/delete-bitmap cleanup only for the real tablet, while the\ntemporary tablet only normalizes its local graph;\n- adds a unit test that simulates a polluted tmp graph with `[2]`,\n`[3]`, and a stale compaction rowset `[2-3]`.\n\n## Root cause\n\n#62256 fixed the final commit graph but not the earlier delete-bitmap\nrecompute graph.\n\nThe final tablet graph and the temporary delete-bitmap tablet graph must\nuse the same schema-change output rowset path for historical versions.\nOtherwise delete bitmap recomputation may be based on a different rowset\npath from the one that becomes visible after schema change.\n\nThis is why the issue can surface in a compaction after schema change\nhas finished: the compaction output itself does not need to contain\nduplicate rows. The failure comes from delete bitmap state being\nrecomputed from a polluted temporary rowset graph and later being\napplied to the committed schema-change graph.\n\n## Testing\n\n```\n./run-be-ut.sh --run --filter\u003dCloudTabletDeleteRowsetsForSchemaChangeTest.* -j100\n```"
    },
    {
      "commit": "d073c953e8a44abfadbcde8b16cc8d1903c42fd7",
      "tree": "d7fac5b826e6c809bcc1a2bafc33be658833b4ed",
      "parents": [
        "55769485e171fe9b7799aed3e99b1c28d7df6238"
      ],
      "author": {
        "name": "linrrarity",
        "email": "linzhenqi@selectdb.com",
        "time": "Tue Jun 02 11:04:08 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 11:04:08 2026 +0800"
      },
      "message": "[Fix](pyudf) Convert nested map value correctly (#63907)\n\nProblem Summary:\n\nFix Python UDF nested complex type conversion when `MAP` appears inside\n`ARRAY`, `STRUCT`, or vectorized inputs.\n\nPreviously, Python UDF argument conversion mostly relied on PyArrow\u0027s\ndefault conversions(`Scalar.as_py()`, `Array.to_pylist()`,\n`Array.to_pandas()`). Those APIs convert a top-level Arrow `MAP` into\nPython-friendly values in some paths, but nested `MAP` values are\nexposed as list-of-tuples. For example, `ARRAY\u003cMAP\u003cSTRING, INT\u003e\u003e` could\narrive in Python as `[[(\u0027a\u0027, 1)]]` instead of `[{\u0027a\u0027: 1}]`. This made\nuser UDF code see nested maps as `list` instead of `dict`.\n\nThis PR introduces a recursive Arrow-value conversion helper and applies\nit consistently across Python UDF argument conversion paths. The helper\nmanually reconstructs Python values according to the Arrow type:\n- `MAP` -\u003e `dict`\n- `LIST` / `LARGE_LIST` -\u003e `list`\n- `STRUCT` -\u003e `dict`\n\nbefore\n```sql\nCREATE FUNCTION py_deep_nested_debug(ARRAY\u003cMAP\u003cSTRING, ARRAY\u003cINT\u003e\u003e\u003e )\nRETURNS STRING\nPROPERTIES (\n    \"type\" \u003d \"PYTHON_UDF\",\n    \"symbol\" \u003d \"evaluate\",\n    \"runtime_version\" \u003d \"3.12.11\",\n    \"always_nullable\" \u003d \"true\"\n)\nAS $$\ndef evaluate(arr):\n    if arr is None:\n        return \u0027None\u0027\n    return \u0027outer_type\u003d{}, outer_repr\u003d{}\u0027.format(type(arr).__name__, repr(arr))\n$$;\n\nSELECT py_deep_nested_debug([{\u0027a\u0027: [1, 2], \u0027b\u0027: [3]}, {\u0027c\u0027: [4, 5, 6]}]);\n+-------------------------------------------------------------------------------+\n| py_deep_nested_debug([{\u0027a\u0027: [1, 2], \u0027b\u0027: [3]}, {\u0027c\u0027: [4, 5, 6]}])             |\n+-------------------------------------------------------------------------------+\n| outer_type\u003dlist, outer_repr\u003d[[(\u0027a\u0027, [1, 2]), (\u0027b\u0027, [3])], [(\u0027c\u0027, [4, 5, 6])]] |\n+-------------------------------------------------------------------------------+\n```\n\nnow:\n```text\nSELECT py_deep_nested_debug([{\u0027a\u0027: [1, 2], \u0027b\u0027: [3]}, {\u0027c\u0027: [4, 5, 6]}]);\n+-------------------------------------------------------------------------+\n| py_deep_nested_debug([{\u0027a\u0027: [1, 2], \u0027b\u0027: [3]}, {\u0027c\u0027: [4, 5, 6]}])       |\n+-------------------------------------------------------------------------+\n| outer_type\u003dlist, outer_repr\u003d[{\u0027a\u0027: [1, 2], \u0027b\u0027: [3]}, {\u0027c\u0027: [4, 5, 6]}] |\n+-------------------------------------------------------------------------+\n```"
    },
    {
      "commit": "55769485e171fe9b7799aed3e99b1c28d7df6238",
      "tree": "ecc0f9ba298c2fc675b405e245bde439b77d2fb1",
      "parents": [
        "eab8ef409ea89baa4a1e76988afa4fc07e19b474"
      ],
      "author": {
        "name": "Arpit Jain",
        "email": "3242828+arpitjain099@users.noreply.github.com",
        "time": "Tue Jun 02 11:54:52 2026 +0900"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 10:54:52 2026 +0800"
      },
      "message": "[doc](docs) Fix Apache license compliance wording (#63186)\n\n### What problem does this PR solve?\n\nProblem Summary:\nThe License note in `README.md` uses awkward grammar: \"to be complied\nwith Apache 2.0 License\". This updates the sentence to \"to comply with\nApache 2.0 License\" for clearer documentation.\n\nSigned-off-by: Arpit Jain \u003carpitjain099@gmail.com\u003e"
    },
    {
      "commit": "eab8ef409ea89baa4a1e76988afa4fc07e19b474",
      "tree": "c18106bc84e7e4066528b6370186debc3018d978",
      "parents": [
        "23e21f44f0080e25deff5dcdda2e61af3efc9480"
      ],
      "author": {
        "name": "Arpit Jain",
        "email": "3242828+arpitjain099@users.noreply.github.com",
        "time": "Tue Jun 02 11:51:06 2026 +0900"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 10:51:06 2026 +0800"
      },
      "message": "[fix](build) Bump UI axios to patched release (#63185)\n\n### What problem does this PR solve?\n\nProblem Summary:\nThe legacy UI package depends on `axios` `^0.19.2`, which is affected by\nknown security advisories. This updates the dependency to a patched\nrelease line (`^1.16.0`) to reduce exposure from vulnerable transitive\nHTTP client behavior.\n\nSigned-off-by: Arpit Jain \u003carpitjain099@gmail.com\u003e"
    },
    {
      "commit": "23e21f44f0080e25deff5dcdda2e61af3efc9480",
      "tree": "bd105c3851080312c1f24b163bfc3a9d565f6540",
      "parents": [
        "14f7cd2247e1023888dd1e51f2d5a426e6c221cf"
      ],
      "author": {
        "name": "Sim Chou",
        "email": "466902955@qq.com",
        "time": "Tue Jun 02 10:38:52 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 10:38:52 2026 +0800"
      },
      "message": "[improve](nereids) filter nereidsPrunedTabletIds per partition in distributionPrune (#63851)\n\n### What problem does this PR solve?\n#53403 short-circuited distributionPrune to return the entire\nnereidsPrunedTabletIds set when running under Nereids. However, the\ncaller computeTabletInfo invokes distributionPrune inside a\nper-partition loop and then iterates the returned ids, calling\nMaterializedIndex.getTablet(id) on each. When nereidsPrunedTabletIds\ncontains tablets across many\npartitions, every per-partition iteration walks the entire global set\nand does a getTablet hash lookup on ids that belong to other partitions\n(which are then filtered out by the null check), yielding O(partitionNum\n* globalPrunedSize) lookups. The short-circuit also copies the full\nHashSet into a new ArrayList once per partition.\n\nFilter the global set down to the current partition\u0027s tablet ids\n(tabletIdsInOrder, already prepared by the caller) before returning. The\nresult is identical to what the caller\u0027s null-check would have produced,\nso behavior is unchanged; only the redundant lookups and copies are\neliminated. The non-Nereids path, the sampleTabletIds path and the\nempty-set\nfallback are untouched.\n\nIssue Number: close #63854\n\nRelated PR: #53403\n\nProblem Summary:\n\nPlan time of OlapScan queries with many partitions and many globally\npruned tablets degrades quadratically due to redundant per-partition\niterations over the global pruned tablet set in\nOlapScanNode.distributionPrune. Restore per-partition complexity by\nfiltering the global set down to the current partition\u0027s tablets before\nreturning.\n\nCo-authored-by: zhousimin \u003czhousimin@kuaishou.com\u003e"
    },
    {
      "commit": "14f7cd2247e1023888dd1e51f2d5a426e6c221cf",
      "tree": "e3b0dd28e41021ed1340c3c30f0050e62e50e6ac",
      "parents": [
        "f1ad42e83189d02074861830377a813f79fad31a"
      ],
      "author": {
        "name": "seawinde",
        "email": "wusi@selectdb.com",
        "time": "Tue Jun 02 10:19:36 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 10:19:36 2026 +0800"
      },
      "message": "[feature](function) Support murmur_hash3_128 function (#63196)\n\nThis PR adds a builtin 128-bit MurmurHash3 scalar function,\n`murmur_hash3_128`, for callers that need a wider hash value than the\nexisting 32-bit and 64-bit variants.\n\n| File | Change Description |\n|------|--------------------|\n| `be/src/exprs/function/function_hash.cpp` | Adds BE implementation for\n`murmur_hash3_128`, returning LARGEINT. |\n| `be/test/exprs/function/function_hash_test.cpp` | Adds unit coverage\nfor constant and multi-argument hash cases. |\n| `BuiltinScalarFunctions.java` | Registers the Nereids scalar function.\n|\n| `MurmurHash3128.java` | Adds FE scalar function metadata and\nsignatures. |\n| `ScalarFunctionVisitor.java` | Adds visitor entry for the new scalar\nfunction. |\n\nDesign rationale: the function reuses the existing MurmurHash3 x64\n128-bit processing path in BE and exposes the packed 128-bit result as\nLARGEINT, matching Doris\u0027 existing signed 128-bit integer\nrepresentation.\n\nBE execution logic:\n\n`murmur_hash3_128` follows the same variadic hash execution model as the\nexisting Doris hash functions. The BE function framework does not\nevaluate all arguments for one row in a single call. Instead,\n`FunctionVariadicArgumentsBase` invokes the implementation once per\nargument column:\n\n```text\nfirst_apply(arg0, result_column)\ncombine_apply(arg1, result_column)\ncombine_apply(arg2, result_column)\n...\n```\n\nFor a query such as:\n\n```sql\nSELECT murmur_hash3_128(k1, \u0027world\u0027) FROM t;\n```\n\nthe first round processes the whole `k1` column with `execute\u003ctrue\u003e()`.\nIt creates one LARGEINT result slot per row and initializes each row\u0027s\nMurmurHash3 128-bit state from seed `0`. The second round processes the\nconstant argument `\u0027world\u0027` with `execute\u003cfalse\u003e()`. It reads each row\u0027s\nprevious state from the result column, updates that state with the new\nargument bytes, and writes the packed state back.\n\nFor example, if `k1` has two rows:\n\n```text\nrow0: \"hello\"\nrow1: \"apache\"\n```\n\nthe state evolves as:\n\n```text\nInitial result column:\nrow0: empty\nrow1: empty\n\nRound 1: execute\u003ctrue\u003e() for k1\nrow0: init_hash(\"hello\")  -\u003e state_hello\nrow1: init_hash(\"apache\") -\u003e state_apache\n\nResult column:\nrow0: pack(state_hello)\nrow1: pack(state_apache)\n\nRound 2: execute\u003cfalse\u003e() for \u0027world\u0027\nrow0: unpack(state_hello)  -\u003e update_hash(\"world\") -\u003e state_hello_world\nrow1: unpack(state_apache) -\u003e update_hash(\"world\") -\u003e state_apache_world\n\nFinal result column:\nrow0: pack(state_hello_world)\nrow1: pack(state_apache_world)\n```\n\nThe underlying MurmurHash3 128-bit state is the pair `(h1, h2)`. For a\nsingle argument, the implementation can call the existing\n`murmur_hash3_x64_128(data, len, 0, out)` directly. For multiple\narguments, each later argument must continue from the previous `(h1,\nh2)` state; calling `murmur_hash3_x64_128` independently for every\nargument would restart from seed `0` and lose the effect of earlier\narguments.\n\nBecause the SQL return type is LARGEINT and the BE result column stores\none `__int128_t` value per row, the implementation packs `(h1, h2)` into\nthe result column between argument rounds and unpacks it before\nprocessing the next argument:\n\n```text\nhigh 64 bits                 low 64 bits\n+-------------------------+-------------------------+\n|           h2            |           h1            |\n+-------------------------+-------------------------+\n```"
    },
    {
      "commit": "f1ad42e83189d02074861830377a813f79fad31a",
      "tree": "6e070f974cd23a0939e3c40e072dd5d39f615a8b",
      "parents": [
        "acbc988b268327e7a273385f8dbdb4f3e8fd83c1"
      ],
      "author": {
        "name": "Pxl",
        "email": "xl@selectdb.com",
        "time": "Tue Jun 02 10:15:34 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 10:15:34 2026 +0800"
      },
      "message": "[fix](be) Fix TopN runtime filter activation (#63969)\n\n#59088 changed TopN runtime predicate target\ninitialization to rely on a storage column id. For targets that cannot\ncreate a storage column predicate, such as non-pushdown TopN predicates\nor unsupported storage columns, init_target returned before marking the\ntarget as detected. That left RuntimePredicate disabled, so the scan\nside ignored the TopN source even though FE had sent the source id. This\nPR keeps the target detected when no storage predicate is created,\nremoves obsolete compatibility skips for missing runtime predicate\ndescs, and adds FE/BE coverage for the source marking and no-column\ntarget paths."
    },
    {
      "commit": "acbc988b268327e7a273385f8dbdb4f3e8fd83c1",
      "tree": "fd80b1afc32b80544e31fa4f3d4b5ee2f3093e27",
      "parents": [
        "a2e76e080e74759058406bb0e927160115bbd5e5"
      ],
      "author": {
        "name": "Asish Kumar",
        "email": "87874775+officialasishkumar@users.noreply.github.com",
        "time": "Tue Jun 02 07:43:11 2026 +0530"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 10:13:11 2026 +0800"
      },
      "message": "[fix](build) Upgrade Maven from 3.6.3 to 3.9.9 in build-env images (#63600)\n\n### What problem does this PR solve?\n\nIssue Number: close #62412\n\nProblem Summary:\n\nThe FE build enforces Maven `\u003e\u003d 3.9.0` via the `maven-enforcer-plugin`\n(`fe/pom.xml`, `requireMavenVersion` `[3.9.0,)`), which is required by\nthe Maven build cache extension configured under `fe/.mvn`. However,\nevery build-env Docker image under `docker/compilation/` still installs\nMaven **3.6.3**:\n\n- `docker/compilation/Dockerfile` (source of the recommended\n`build-env-ldb-toolchain-latest` image)\n- `docker/compilation/Dockerfile.gcc10`\n- `docker/compilation/Dockerfile.gcc7`\n- `docker/compilation/arm/Dockerfile`\n\nAs a result, building Doris `master` inside the recommended\n`apache/doris:build-env-ldb-toolchain-latest` image fails during the FE\nphase with the enforcer check:\n\n```\n[ERROR] Rule 0: org.apache.maven.enforcer.rules.version.RequireMavenVersion failed with message:\n[ERROR] Detected Maven Version: 3.6.3 is not in the allowed range [3.9.0,).\n```\n\nThis PR upgrades Maven to **3.9.9** in all four build-env Dockerfiles so\nthe images satisfy the enforced version range.\n\nAdditional notes:\n\n- All four images now download the artifact from the permanent\n`archive.apache.org` location, so the pinned version keeps resolving\nafter it rolls off the current-release mirrors (`downloads.apache.org` /\n`dlcdn.apache.org` only retain recent releases).\n- This also repairs the `gcc7` image, which fetched Maven from the\nlong-defunct `mirror.bit.edu.cn` host; its `SHA-512` checksum is updated\nto the official value for `apache-maven-3.9.9-bin.tar.gz`."
    },
    {
      "commit": "a2e76e080e74759058406bb0e927160115bbd5e5",
      "tree": "bb0fa9daed3eb2a09f3c3e4e716ac861ff0a2aef",
      "parents": [
        "f4b06fd895cce3f15ed8b151d1d9473ca9fb14be"
      ],
      "author": {
        "name": "linrrarity",
        "email": "linzhenqi@selectdb.com",
        "time": "Tue Jun 02 09:57:29 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 09:57:29 2026 +0800"
      },
      "message": "[Fix](nereids) Preserve negative zero sign in SIGNBIT constant folding (#63954)\n\nProblem Summary:\n\n`signbit` in Nereids FE constant folding used `value \u003c 0` to determine\nthe sign, which treats `-0.0` as non-negative and folds it to `false`.\nThis is inconsistent with:\n- the runtime BE implementation, which uses `std::signbit`\n- the documented `signbit` behavior, which distinguishes `+0.0` and\n`-0.0`\n\n`+0.0` and `-0.0` compare equal numerically, so `value \u003c 0` cannot\ndistinguish them. Their difference is only recorded in the\nfloating-point sign bit. Using raw bits makes FE constant folding\nconsistent with BE runtime semantics.\n\nbefore:\n```text\nDoris\u003e set debug_skip_fold_constant\u003dtrue;\nQuery OK, 0 rows affected (0.024 sec)\n\nDoris\u003e select signbit(cast(\u0027+0.0\u0027 as double)) , signbit(cast(\u0027-0.0\u0027 as double));\n+---------------------------------+---------------------------------+\n| signbit(cast(\u0027+0.0\u0027 as double)) | signbit(cast(\u0027-0.0\u0027 as double)) |\n+---------------------------------+---------------------------------+\n|                               0 |                               1 |\n+---------------------------------+---------------------------------+\n1 row in set (0.108 sec)\n\nDoris\u003e set debug_skip_fold_constant\u003dfalse;\nQuery OK, 0 rows affected (0.002 sec)\n\nDoris\u003e select signbit(cast(\u0027+0.0\u0027 as double)) , signbit(cast(\u0027-0.0\u0027 as double));\n+---------------------------------+---------------------------------+\n| signbit(cast(\u0027+0.0\u0027 as double)) | signbit(cast(\u0027-0.0\u0027 as double)) |\n+---------------------------------+---------------------------------+\n|                               0 |                               0 |\n+---------------------------------+---------------------------------+\n```\n\nnow:\n```text\nDoris\u003e set debug_skip_fold_constant\u003dtrue;\nQuery OK, 0 rows affected (0.012 sec)\n\nDoris\u003e select signbit(cast(\u0027+0.0\u0027 as double)) , signbit(cast(\u0027-0.0\u0027 as double));\n+---------------------------------+---------------------------------+\n| signbit(cast(\u0027+0.0\u0027 as double)) | signbit(cast(\u0027-0.0\u0027 as double)) |\n+---------------------------------+---------------------------------+\n|                               0 |                               1 |\n+---------------------------------+---------------------------------+\n1 row in set (0.070 sec)\n\nDoris\u003e set debug_skip_fold_constant\u003dfalse;\nQuery OK, 0 rows affected (0.002 sec)\n\nDoris\u003e select signbit(cast(\u0027+0.0\u0027 as double)) , signbit(cast(\u0027-0.0\u0027 as double));\n+---------------------------------+---------------------------------+\n| signbit(cast(\u0027+0.0\u0027 as double)) | signbit(cast(\u0027-0.0\u0027 as double)) |\n+---------------------------------+---------------------------------+\n|                               0 |                               1 |\n+---------------------------------+---------------------------------+\n1 row in set (0.010 sec)\n```"
    },
    {
      "commit": "f4b06fd895cce3f15ed8b151d1d9473ca9fb14be",
      "tree": "ee6760d78ae3d7e3e130a63cf6d6b5f2f09408e2",
      "parents": [
        "4901da10194516e5c7875f45034bf8b2cd08898c"
      ],
      "author": {
        "name": "lihangyu",
        "email": "lihangyu@selectdb.com",
        "time": "Mon Jun 01 22:13:24 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 22:13:24 2026 +0800"
      },
      "message": "[fix](variant) fix array subscript on pruned variant subpath (#63891)\n\n### What problem does this PR solve?\n\nFix variant subpath pruning for projections where the top-level\nexpression is an array subscript or `element_at` over a variant subpath.\nThe planner could leave the outer subscript on the original variant\naccess chain after pruning, which made valid 1-based array subscripts\nreturn `NULL`.\n\nThe original array-of-objects repro depends on nested-group variant\nsemantics, so the regression in this PR uses a plain `VARIANT` array\nleaf without nested group. Since that query result is already correct on\ncurrent master, the regression asserts the verbose plan instead: the\nscan uses `subColPath\u003d[items, type]` and the final array subscript is\napplied to the pruned variant slot.\n\n### Check List\n\n- [x] Added regression test\n- [x] Added FE planner unit test\n\n### Tests\n\n- `./run-regression-test.sh --run --conf tmp/regression-conf.auto.groovy\n-d variant_p0 -s test_variant_array_subscript`\n- `./run-fe-ut.sh --run\norg.apache.doris.nereids.rules.rewrite.VariantPruningLogicTest` passed\nearlier on the same FE code; rerun after this regression-only amend was\nblocked by system pid/thread exhaustion before test execution."
    },
    {
      "commit": "4901da10194516e5c7875f45034bf8b2cd08898c",
      "tree": "f4314bf7685a319b0928a8db0a1201c209a9f2bf",
      "parents": [
        "47611dceac33814d9df7e80ba85f32dfa8909d18"
      ],
      "author": {
        "name": "Yongtao Huang",
        "email": "yongtaoh2022@gmail.com",
        "time": "Mon Jun 01 19:57:46 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 19:57:46 2026 +0800"
      },
      "message": "[Chore] correct null check in `DictionaryManager.dropTableDictionaries()` (#63630)\n\nLong log:\n\nThe null check was mistakenly performed on the `id` parameter instead of\nthe `dict` returned from `idToDictionary.remove(id)`.\n\nSigned-off-by: Yongtao Huang \u003cyongtaoh2022@gmail.com\u003e"
    },
    {
      "commit": "47611dceac33814d9df7e80ba85f32dfa8909d18",
      "tree": "bf059e670baa2f5bd4c6d2bce0ea61acb7b4cc4f",
      "parents": [
        "f0d256b48c52a7253be9ef97977017c89c8962a9"
      ],
      "author": {
        "name": "nsivarajan",
        "email": "117266407+nsivarajan@users.noreply.github.com",
        "time": "Mon Jun 01 15:55:20 2026 +0530"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 18:25:20 2026 +0800"
      },
      "message": "[Fix](Query Stats) Add QueryStatsRecorder for column-level query and filter - Part2 (#63768)\n\n### What problem does this PR solve?\n\nRelated PR: #63067 \n\nProblem Summary:\n\nPR is a Follow-up of #63067 , Extends column-level query/filter hit\nrecording to cover all major Nereids physical plan constructs beyond the\nbase PhysicalOlapScan:\n\n  - Alias resolution: SELECT k1 AS name records k1.queryHit\n  - GROUP BY keys: GROUP BY k1 records k1.queryHit\n  - Aggregate input columns: SUM(k2) records k2.queryHit\n  - ORDER BY columns: ORDER BY k2 records k2.queryHit\n  - Window PARTITION BY / ORDER BY keys\n  - Window value columns: SUM(k2) OVER (...) records k2.queryHit\n- JOIN ON conditions (hash + non-equi): records filterHit on both sides\n  - ROLLUP/CUBE grouping sets via PhysicalRepeat\n  - PartitionTopN partition and order keys (ROW_NUMBER per-partition)\n- Storage-layer aggregate pushdown: COUNT(*)/MIN/MAX queries record\nstats\n  - Lazy materialization scan slot remapping via row-id lookup\n  \nOut of scope (tracked for Part 3):\n\nThe following cases are intentionally deferred and not bugs in this PR:\n- UNION / INTERSECT / EXCEPT — set operation output slots are not yet\nremapped to child scans\n- CTE consumer columns — consumer-side slot IDs differ from producer\nscan slots\n  - LATERAL VIEW / EXPLODE — generator output slots are not yet remapped\n- HAVING SUM(k2) \u003e 0 — aggregate output predicates; simple HAVING k1 \u003e 0\nalready works\n- External tables (Hive / Iceberg / JDBC) — deferred, requires separate\ndesign\n\n---------\n\nCo-authored-by: Sivarajan Narayanan \u003cnarayanan_sivarajan@apple.com\u003e"
    },
    {
      "commit": "f0d256b48c52a7253be9ef97977017c89c8962a9",
      "tree": "3300b56abc5aea74c5f0bbe1403a785a496c1b05",
      "parents": [
        "cada7b9f0fce6e5eca5dd3fd4eb6009b84f82030"
      ],
      "author": {
        "name": "zclllyybb",
        "email": "zhaochangle@selectdb.com",
        "time": "Mon Jun 01 17:42:52 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 17:42:52 2026 +0800"
      },
      "message": "[chore](build) Add doris-skills submodule (#63961)\n\nProblem Summary: Add the\n[apache/doris-skills](https://github.com/apache/doris-skills) repository\nas a root-level submodule and configure it to track the main branch so\nmaintainers can refresh it with git submodule update --remote\ndoris-skills."
    },
    {
      "commit": "cada7b9f0fce6e5eca5dd3fd4eb6009b84f82030",
      "tree": "519be28ccd4b894f2f9e48dd797511b8e7638b10",
      "parents": [
        "627fba17c6f20423e27c428f40069fac495a9a57"
      ],
      "author": {
        "name": "zclllyybb",
        "email": "zhaochangle@selectdb.com",
        "time": "Mon Jun 01 17:10:27 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 17:10:27 2026 +0800"
      },
      "message": "Revert \"[Feature](skill) Introduce Doris profile reader skill\" (#63959)\n\nReverts apache/doris#63948"
    },
    {
      "commit": "627fba17c6f20423e27c428f40069fac495a9a57",
      "tree": "2d881091583f383b19934fbd7a115f5d9107d16b",
      "parents": [
        "138ab5cb1f4b07601c745fe701e7ca72c389dd35"
      ],
      "author": {
        "name": "Zhen Chen",
        "email": "czjourney@163.com",
        "time": "Mon Jun 01 16:50:56 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 16:50:56 2026 +0800"
      },
      "message": "[chore](doc) Improve README formatting for clarity (#63905)"
    },
    {
      "commit": "138ab5cb1f4b07601c745fe701e7ca72c389dd35",
      "tree": "083702bac8c453d75a9b0881a7415aef33d45828",
      "parents": [
        "c0841744d1dd1b7e13a90097f1d315da06bd2226"
      ],
      "author": {
        "name": "zhengyu",
        "email": "zhangzhengyu@selectdb.com",
        "time": "Mon Jun 01 16:43:38 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 16:43:38 2026 +0800"
      },
      "message": "[fix](filecache) fix clear_file_cache right after reboot causing file cache size percent overflow (#63410)\n\nProblem Summary: When file cache LRU restore creates a block from dump\nmetadata and later lazy loading finds the same hash/offset with a\nsmaller real file size, reset_range only updated the LRU queue size and\n_cur_cache_size. The FileBlock range still kept the old restored size,\nso a later async clear or eviction subtracted the old block size and\ncould underflow _cur_cache_size, producing huge size_percent values in\nneed-evict-cache-in-advance logs. This change makes reset_range update\nthe FileBlock range as the single place that keeps the FileBlock, LRU\nqueue, _cur_cache_size, and TTL size accounting consistent.\nFileBlock::finalize now delegates the range shrink to reset_range\ninstead of changing the range before calling it."
    },
    {
      "commit": "c0841744d1dd1b7e13a90097f1d315da06bd2226",
      "tree": "4d6e1db87215c3928923e0be38d9c5970fae6ab7",
      "parents": [
        "c7449c6434fc6899af59bbc31082d93d38c30970"
      ],
      "author": {
        "name": "zhengyu",
        "email": "zhangzhengyu@selectdb.com",
        "time": "Mon Jun 01 16:41:10 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 16:41:10 2026 +0800"
      },
      "message": "[fix](filecache) add async lru update machanism and fix partial hit in cache reader (#61083)\n\n- CachedRemoteFileReader::read_at_impl has incorrect initialization of\nsubsequent traversal start point and count after direct partial hit\n(causes incorrect fallback / extra overhead)\n- Proposed LRU ordering async update solution: Decoupled LRU update from\nquery read operations, slightly reducing read lock latency and laying\ngroundwork for subsequent lock splitting\n- Established performance unit tests for cache lock\n\nSigned-off-by: zhengyu \u003czhangzhengyu@selectdb.com\u003e"
    },
    {
      "commit": "c7449c6434fc6899af59bbc31082d93d38c30970",
      "tree": "8971bcfa6bf39c1c98380cc96157b4bcfe33d427",
      "parents": [
        "0b3d70c14071fa7416bed2aface4d9aa94a40d5a"
      ],
      "author": {
        "name": "zhengyu",
        "email": "zhangzhengyu@selectdb.com",
        "time": "Mon Jun 01 16:36:26 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 16:36:26 2026 +0800"
      },
      "message": "[fix](filecache) avoid crash when late holder cleanup sees removed cache cell (#62437)\n\nProblem Summary:\nWhen FileBlocksHolder is destroyed late, the corresponding file block\nmay already be removed or replaced in block file cache metadata.\n`BlockFileCache::remove()` can dereference a stale cache cell during\nduplicate cleanup and crash.\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test \u003c!-- At least one of them must be included. --\u003e\n    - [ ] Regression test\n    - [ ] Unit Test\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [x] No need to test or manual test. Explain why:\n- [ ] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [x] Other reason \u003c!-- Add your reason?  --\u003e\n- Cherry-pick only. The picked commits already include BE unit-test\ncoverage, and no local build/test was requested for this task.\n\n- Behavior changed:\n    - [ ] No.\n    - [x] Yes. \u003c!-- Explain the behavior change --\u003e\n- Avoid crash when late holder cleanup sees a removed or replaced cache\ncell, add warning logs for skipped duplicate remove, and add BE unit\ntests for the stale/replaced-cell cleanup paths.\n\n- Does this need documentation?\n    - [x] No.\n- [ ] Yes. \u003c!-- Add document PR link here. eg:\nhttps://github.com/apache/doris-website/pull/1214 --\u003e\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label \u003c!-- Add branch pick label that this PR\nshould merge into --\u003e"
    },
    {
      "commit": "0b3d70c14071fa7416bed2aface4d9aa94a40d5a",
      "tree": "e1e6d5f723d20087c7e595e346cf1d485ae4abe7",
      "parents": [
        "679081e39dbba92e2184462c760c63e51140e919"
      ],
      "author": {
        "name": "hui lai",
        "email": "laihui@selectdb.com",
        "time": "Mon Jun 01 16:31:45 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 16:31:45 2026 +0800"
      },
      "message": "[fix](transaction) select txn insert backend from current cluster (#63634)\n\n### What problem does this PR solve?\n\nProblem Summary:\n\nIn cloud mode with multiple compute groups, transactional `insert into\nvalues` may fail with:\n\n`Cannot invoke \"org.apache.doris.system.Backend.getHost()\" because\n\"backend\" is null`\n\nThe root cause is that `InsertStreamTxnExecutor` selected a backend id\nfrom all clusters through `selectBackendIdsByPolicy(policy, 1)`, but\nthen looked up the selected id from `getBackendsByCurrentCluster()`. If\nthe selected backend belonged to another compute group, the lookup\nreturned null and FE hit an NPE when calling `backend.getHost()`.\n\nThis PR changes txn insert backend selection to use the current cluster\nbackend snapshot as the candidate list, so the selected backend is\nalways from the current compute group."
    },
    {
      "commit": "679081e39dbba92e2184462c760c63e51140e919",
      "tree": "f009988d4b2b35b7d1812d5f9e061fa589f343e5",
      "parents": [
        "f68eda67aadf25c767c1883aa843fe011997f0b6"
      ],
      "author": {
        "name": "minghong",
        "email": "zhouminghong@selectdb.com",
        "time": "Mon Jun 01 16:12:01 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 16:12:01 2026 +0800"
      },
      "message": "[feature](runtime filter) Add decoupled runtime filter support (#62737)\n\n### What problem does this PR solve?\nsupport decoupled rf.\n(A join B) join (C join D)\ndecoupled rf: B-\u003eC and B-\u003eD\n\nIssue Number: close #xxx\n\nRelated PR: #xxx\n\nProblem Summary:\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test \u003c!-- At least one of them must be included. --\u003e\n    - [ ] Regression test\n    - [ ] Unit Test\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [ ] No need to test or manual test. Explain why:\n- [ ] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [ ] Other reason \u003c!-- Add your reason?  --\u003e\n\n- Behavior changed:\n    - [ ] No.\n    - [ ] Yes. \u003c!-- Explain the behavior change --\u003e\n\n- Does this need documentation?\n    - [ ] No.\n- [ ] Yes. \u003c!-- Add document PR link here. eg:\nhttps://github.com/apache/doris-website/pull/1214 --\u003e\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label \u003c!-- Add branch pick label that this PR\nshould merge into --\u003e\n\n---------\n\nCo-authored-by: Copilot \u003c223556219+Copilot@users.noreply.github.com\u003e"
    },
    {
      "commit": "f68eda67aadf25c767c1883aa843fe011997f0b6",
      "tree": "6bea8ef0857ce756841bdba9f174abaa3d69c3c6",
      "parents": [
        "5db57341993609fab70e090115818ce4bee51799"
      ],
      "author": {
        "name": "zclllyybb",
        "email": "zhaochangle@selectdb.com",
        "time": "Mon Jun 01 16:06:52 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 16:06:52 2026 +0800"
      },
      "message": "[Feature](skill) Introduce Doris profile reader skill (#63948)\n\nAdd script to extract Doris profile operator/counter names and update\nAGENTS.md\n\n- Introduced a new Python script `extract_source_profile_inventory.py`\nthat extracts operator and counter names from the Doris source tree,\ngenerating a markdown inventory.\n- Updated AGENTS.md to include a requirement for formatting BE code with\nthe correct skill before committing if modifications are made."
    },
    {
      "commit": "5db57341993609fab70e090115818ce4bee51799",
      "tree": "f72e4a567667846c5cfd3ee9461557056f5590c3",
      "parents": [
        "d85172722133c3deda1ef87b43dc4e6202651871"
      ],
      "author": {
        "name": "foxtail463",
        "email": "foxtail463@gmail.com",
        "time": "Mon Jun 01 16:04:36 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 16:04:36 2026 +0800"
      },
      "message": "[Enhancement](mv): Improve MV predicate compensation and keep original min-max predicates non-inferred (#61345)\n\nProblem Summary:\n\nThe old residual compensation simply checked whether the query residual\nset contained all view residual predicates. This breaks on real-world MV\nrewrites where query and view residuals are structurally different but\nlogically implicative, for example:\n\n  query residual: A OR (B AND C)\n  view residual : A OR B\n\nThe old code sees two different expression trees and bails out, even\nthough the query side is strictly stronger.\n\nThis patch introduces DNF-based implication checking (impliesByDnf) to\nreplace the set-containment approach, so compensation succeeds whenever\nthe query candidates logically imply the view residual regardless of\nstructural differences. A hard cap (MAX_DNF_BRANCHES\u003d1024) guards\nagainst exponential expansion; when the proof is too expensive,\ncompensation fails conservatively rather than hanging the optimizer.\n\nThis patch also fixes predicate provenance in AddMinMax. AddMinMax may\nderive min/max predicates and then move equivalent boundary predicates\nfrom the original expression into the generated min/max list. If the\nboundary predicate already existed in the original SQL, it must remain\nnon-inferred even after being moved; otherwise MV compensation may later\nfilter it out as an inferred predicate and lose a real query boundary.\nPurely generated min/max predicates are still marked as inferred.\n\nThe three separate compensate calls in AbstractMaterializedViewRule are\ncollapsed into a single Predicates.compensatePredicates entry point that\nencapsulates candidate collection and residual finalization.\n\n---------\n\nCo-authored-by: yangtao555 \u003cyangtao555@jd.com\u003e"
    },
    {
      "commit": "d85172722133c3deda1ef87b43dc4e6202651871",
      "tree": "98b2d659bb297ce84aa8b9ea096dce1e3a94f475",
      "parents": [
        "d7f9fa57f2a51867117a6e0987db6546a0029f37"
      ],
      "author": {
        "name": "lihangyu",
        "email": "lihangyu@selectdb.com",
        "time": "Mon Jun 01 15:27:32 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 15:27:32 2026 +0800"
      },
      "message": "[fix](variant) Remove deprecated flatten nested setting from P1 regression (#63840)\n\ncherry-pick #61466"
    },
    {
      "commit": "d7f9fa57f2a51867117a6e0987db6546a0029f37",
      "tree": "bf8408ac74c1cd3ab5d8334fc722c55d65b051e2",
      "parents": [
        "c24323874675762d1b42b8487b3dcd9b5ed81061"
      ],
      "author": {
        "name": "yiguolei",
        "email": "guolei@selectdb.com",
        "time": "Mon Jun 01 15:08:22 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 15:08:22 2026 +0800"
      },
      "message": "[refactor](be)simplify interface in schema and rowcursor (#63925)\n\n### What problem does this PR solve?\n\nIssue Number: close #xxx\n\nRelated PR: #xxx\n\nProblem Summary:\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test \u003c!-- At least one of them must be included. --\u003e\n    - [ ] Regression test\n    - [ ] Unit Test\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [ ] No need to test or manual test. Explain why:\n- [ ] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [ ] Other reason \u003c!-- Add your reason?  --\u003e\n\n- Behavior changed:\n    - [ ] No.\n    - [ ] Yes. \u003c!-- Explain the behavior change --\u003e\n\n- Does this need documentation?\n    - [ ] No.\n- [ ] Yes. \u003c!-- Add document PR link here. eg:\nhttps://github.com/apache/doris-website/pull/1214 --\u003e\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label \u003c!-- Add branch pick label that this PR\nshould merge into --\u003e"
    },
    {
      "commit": "c24323874675762d1b42b8487b3dcd9b5ed81061",
      "tree": "2feaab179d65cf3f8d8f030ed487415bdb0a30f9",
      "parents": [
        "04624351573ea14ae15e3ce06ba7b4e206643918"
      ],
      "author": {
        "name": "linrrarity",
        "email": "linzhenqi@selectdb.com",
        "time": "Mon Jun 01 14:51:34 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 14:51:34 2026 +0800"
      },
      "message": "[Enhancement](udf) support volatility for udaf \u0026\u0026 udtf (#63611)\n\nRelated PR: https://github.com/apache/doris/pull/62698\n\nProblem Summary:\n\nAdd volatility metadata support for UDAF and UDTF definitions, so\nuser-defined aggregate and table functions can preserve and expose their\nvolatility semantics consistently with UDFs."
    },
    {
      "commit": "04624351573ea14ae15e3ce06ba7b4e206643918",
      "tree": "64b836aabc701c824d074e1d2aaa23d4ff8824f8",
      "parents": [
        "2ad56a85edb9678b1b033fa3ce2d845bfda834ba"
      ],
      "author": {
        "name": "heguanhui",
        "email": "hgh_wy163mail@163.com",
        "time": "Mon Jun 01 14:43:46 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 14:43:46 2026 +0800"
      },
      "message": "[fix](be ut) Skip custom memcpy on ARM+ASAN to fix segfault at process startup (#63656)\n\nThe glibc-compatibility module provides a custom memcpy implementation\n(memcpy_aarch64.cpp) that overrides the global memcpy symbol via extern\n\"C\". This is done to avoid dependency on a specific glibc symbol version\n(e.g., memcpy@@GLIBC_2.14) for portability.\n\nHowever, libpthread\u0027s __pthread_initialize_minimal() calls memcpy during\nvery early process startup — before main(), before C++ static\ninitialization, and before ASAN shadow memory is set up. When ASAN is\nenabled, the custom memcpy accesses memory that ASAN shadow has not yet\nmapped, resulting in SIGSEGV.\n\nThis only affects aarch64 + ASAN because:\n\nRELEASE builds have no ASAN shadow memory checks\nx86_64 + ASAN does not exhibit this crash (different shadow memory\nlayout and initialization timing)\n\nCo-authored-by: root \u003croot@DESKTOP-3AF37B\u003e"
    },
    {
      "commit": "2ad56a85edb9678b1b033fa3ce2d845bfda834ba",
      "tree": "b1e0bfe7833a2d4e384cfb09f92b258925371232",
      "parents": [
        "a9a87f86796fa7432a5f66b6d2d83c8c953d3e82"
      ],
      "author": {
        "name": "Mryange",
        "email": "yanxuecheng@selectdb.com",
        "time": "Mon Jun 01 14:42:32 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 14:42:32 2026 +0800"
      },
      "message": "[refine](column) strong type array and map offsets (#63678)\n\n### What problem does this PR solve?\n\nArray and map offset subcolumns were still stored and exposed through\ngeneric IColumn pointers, so callers had to repeatedly downcast them\nback to their concrete offset column types. Root cause: the ownership,\nCOW mutation, deserialization, and segment reader write-back paths still\ntreated these offsets as generic subcolumns, which left the\nstrong-typing change incomplete and kept redundant same-type assert_cast\nusage in affected callers and tests. This change promotes ColumnArray\noffsets and ColumnMap offsets_column to typed wrapped pointers, updates\nthe typed accessors and generic write-back paths to preserve the\nconcrete offset column type, and removes the now-redundant casts in\narray functions, lambda functions, and unit tests. It also cleans up one\nredundant nullable null-map cast uncovered during the BE UT compile.\n### Release note\n\nNone"
    },
    {
      "commit": "a9a87f86796fa7432a5f66b6d2d83c8c953d3e82",
      "tree": "ae0268d6177e45db0736da9ea5cb1b31c7dd8080",
      "parents": [
        "3f5582b3acef9c7088c7422832190806729f420c"
      ],
      "author": {
        "name": "Yixuan Wang",
        "email": "wangyixuan@selectdb.com",
        "time": "Mon Jun 01 14:36:39 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 14:36:39 2026 +0800"
      },
      "message": "[chore](cloud) Support dynamic recycler instance filter config (#63822)\n\nRead recycler whitelist and blacklist directly from config when scanning\ninstances, so runtime config updates can affect filtering without\nrestart.\nAdd a unit test for dynamic filter changes."
    },
    {
      "commit": "3f5582b3acef9c7088c7422832190806729f420c",
      "tree": "f2b50f4520e0a09677ef8cc2a6ec0416e3d1bf98",
      "parents": [
        "d898a1d90d66a00a0fb465fda51cc867fae79759"
      ],
      "author": {
        "name": "yujun",
        "email": "yujun@selectdb.com",
        "time": "Mon Jun 01 12:30:40 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 12:30:40 2026 +0800"
      },
      "message": "[fix](fe) Skip dropped columns in follower stats sync (#63882)\n\n### What problem does this PR solve?\n\nFollowerColumnSender drains queued column references on follower FEs and\nsyncs the columns that still need analysis to the master. A queued\ncolumn can become stale after DDL changes. If the table still exists but\nthe queued column has been dropped, table.getColumn(column.colName)\nreturns null and the sender throws a NullPointerException while reading\nthe type.\n\nThis patch skips dropped columns before checking the column type, so the\ndaemon does not emit periodic ERROR logs and can continue processing the\nremaining queued columns."
    },
    {
      "commit": "d898a1d90d66a00a0fb465fda51cc867fae79759",
      "tree": "d62392b034fb21f1ab3bcb1a660f96d671f49bdd",
      "parents": [
        "18677371380da5b4b4b290cd65e2a7cefdccb794"
      ],
      "author": {
        "name": "Calvin Kirs",
        "email": "guoqiang@selectdb.com",
        "time": "Mon Jun 01 12:26:47 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 12:26:47 2026 +0800"
      },
      "message": "[feature](fe) Push down limit into CTE producer  (#63675)\n\nThis PR adds CTE producer-side limit pushdown in Nereids.\n\nWhen all CTE consumers only need a bounded number of rows, the optimizer\ncollects the required row count from each consumer, takes\nthe maximum value, and pushes that limit into the CTE producer. The\noriginal consumer-side limit is still kept.\n\n  The rule only handles safe shapes:\n\n  ```text\n  LogicalLimit\n    LogicalCTEConsumer\n  ```\n\n  ```text\n  LogicalLimit\n    LogicalProject\n      LogicalCTEConsumer\n  ```\n\n  The project must be row-preserving.\n\n  ## Scenarios\n\n  ### 1. Direct Limit\n\n  ```sql\n  WITH cte AS (\n      SELECT * FROM orders\n  )\n  SELECT * FROM cte\n  LIMIT 10;\n  ```\n\nThe consumer only needs 10 rows, so the CTE producer can produce at most\n10 rows.\n\n  ### 2. Project + Limit\n\n  ```sql\n  WITH cte AS (\n      SELECT order_id, total_price, user_id FROM orders\n  )\n  SELECT order_id, total_price\n  FROM cte\n  LIMIT 10;\n  ```\n\nA normal project only prunes columns and does not change row count, so\nthe producer can still be limited to 10 rows.\n\n  ### 3. Multiple Consumers + Limit\n\n  ```sql\n  WITH cte AS (\n      SELECT * FROM orders\n  )\n  SELECT * FROM cte LIMIT 10\n  UNION ALL\n  SELECT * FROM cte LIMIT 20;\n  ```\n\n  For multiple CTE consumers, the producer limit is:\n\n  ```text\n  producerLimit \u003d max(consumerLimit1, consumerLimit2, ...)\n  ```\n\n  In this case, the pushed producer limit is 20.\n\n  If any consumer needs full CTE data, pushdown is skipped:\n\n  ```sql\n  WITH cte AS (\n      SELECT * FROM orders\n  )\n  SELECT * FROM cte LIMIT 10\n  UNION ALL\n  SELECT * FROM cte;\n  ```\n\n  ### 4. Limit + Offset\n\n  ```sql\n  WITH cte AS (\n      SELECT * FROM orders\n  )\n  SELECT * FROM cte\n  LIMIT 10 OFFSET 100;\n  ```\n\nThe consumer needs to skip 100 rows and then return 10 rows, so the\nproducer must provide at least 110 rows.\n\n  The producer side only truncates rows and does not apply offset:\n\n  ```text\n  producerLimit \u003d limit + offset\n  producerOffset \u003d 0\n  ```\n\n  ### 5. SplitLimit\n\n  ```sql\n  WITH cte AS (\n      SELECT * FROM orders\n  )\n  SELECT * FROM cte\n  LIMIT 10 OFFSET 100;\n  ```\n\nDoris may split this into local/global limits. The local limit closest\nto the CTE consumer already represents `limit + offset`.\n\nThe collector uses the local limit value directly and does not add\noffset again.\n\n  ### 6. Filter + Limit Is Not Matched\n\n  ```sql\n  WITH cte AS (\n      SELECT * FROM orders\n  )\n  SELECT * FROM cte\n  WHERE order_id \u003e 10\n  LIMIT 10;\n  ```\n\nFilter can reduce rows before limit, so the producer may need more than\n10 input rows. This rule does not push limit through filter.\n\n  ### 7. TopN Is Not Matched\n\n  ```sql\n  WITH cte AS (\n      SELECT * FROM orders\n  )\n  SELECT * FROM cte\n  ORDER BY order_id\n  LIMIT 10;\n  ```\n\n`ORDER BY ... LIMIT` is TopN. It needs the first N rows after ordering,\nso it cannot be treated as a normal limit.\n\n  ### 8. Join / Aggregate / Window / Sort Are Not Matched\n\n  ```sql\n  WITH cte AS (\n      SELECT * FROM orders\n  )\n  SELECT *\n  FROM cte JOIN users ON cte.user_id \u003d users.user_id\n  LIMIT 10;\n  ```\n\n  ```sql\n  WITH cte AS (\n      SELECT * FROM orders\n  )\n  SELECT user_id, COUNT(*)\n  FROM cte\n  GROUP BY user_id\n  LIMIT 10;\n  ```\n\n  ```sql\n  WITH cte AS (\n      SELECT * FROM orders\n  )\n  SELECT *\n  FROM (\n      SELECT order_id, ROW_NUMBER() OVER (ORDER BY order_id) AS rn\n      FROM cte\n  ) t\n  LIMIT 10;\n  ```\n\n  ```sql\n  WITH cte AS (\n      SELECT * FROM orders\n  )\n  SELECT *\n  FROM (\n      SELECT * FROM cte ORDER BY order_id\n  ) t\n  LIMIT 10;\n  ```\n\nThese operators can change row cardinality or ordering semantics. Unless\nother rules have already rewritten the shape into `Limit -\u003e\nCTEConsumer` or `Limit -\u003e Project -\u003e CTEConsumer`, this collector skips\nthem."
    },
    {
      "commit": "18677371380da5b4b4b290cd65e2a7cefdccb794",
      "tree": "46a408de397324dea4726246f2c7d7c0bf2f6c3a",
      "parents": [
        "905c80433b1714027bc853b870de77eb415732e7"
      ],
      "author": {
        "name": "morrySnow",
        "email": "zhangwenxin@selectdb.com",
        "time": "Mon Jun 01 12:24:36 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 12:24:36 2026 +0800"
      },
      "message": "[fix](fe) Fix assert row join pushdown alias handling (#63892)\n\n### What problem does this PR solve?\n\nRelated PR: #57414\n\nProblem Summary: A scalar subquery comparison can reference a projected\nalias from the right side of an inner join. PushDownJoinOnAssertNumRows\npreviously identified the pushed condition slots against the project\noutput after rewriting the condition through the project, so aliases\nexpanded to right-child slots could be treated as if no bottom-join\nslots were involved and the alias projection could be attached to the\nleft child. The rewritten plan then referenced slots that were absent\nfrom that child. This change determines slot ownership from the bottom\njoin output after project pushdown, keeps the original pushdown child\norder when assembling the new join, and adds a unit test for the\nright-child alias case.\n\n### Release note\n\nFix query planning failure for scalar subquery comparisons on projected\njoin expressions."
    },
    {
      "commit": "905c80433b1714027bc853b870de77eb415732e7",
      "tree": "ae36c610d524fe329171d9d29efa97f6844355fb",
      "parents": [
        "8db9a80d120c1b1c581edcbd30d6a82cd634bf81"
      ],
      "author": {
        "name": "Mryange",
        "email": "yanxuecheng@selectdb.com",
        "time": "Mon Jun 01 12:18:26 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 12:18:26 2026 +0800"
      },
      "message": "[fix](expr) fix mixed const probe constant handling regressions (#63810)\n\nThe mixed const execution probe exposed several constant-handling\nproblems in BE vectorized functions.\n\n- ColumnConst::clone_resized reused the original nested column, so\ncloned const columns could still alias the source data.\n- quantile_percent requires its percentile argument to stay constant,\nbut the all-const probe path unpacked it and triggered a false\nconstant-check failure.\n- regexp_count accessed string columns directly and did not handle mixed\nconst inputs correctly.\n- uniform still went through the default constant implementation even\nthough its result depends on per-row seed values.\n\nThis change fixes those behaviors and adds focused unit tests for the\nuncovered cases."
    },
    {
      "commit": "8db9a80d120c1b1c581edcbd30d6a82cd634bf81",
      "tree": "e0cfc3836fb0c6ff36ceb28b52c751b23eb55895",
      "parents": [
        "5b5b2ae1330e882eb34b4a4a4a627bfdb380dd70"
      ],
      "author": {
        "name": "HonestManXin",
        "email": "HonestManXin@gmail.com",
        "time": "Mon Jun 01 11:57:48 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 11:57:48 2026 +0800"
      },
      "message": "[fix](point-query) Refresh stale prepared short-circuit plans (#63920)\n\nPrepared point queries cache short-circuit planner state and the\nassociated StatementContext.\nWhen a target table is renamed, replaced, or swapped, the cached plan\nmay still reference stale\ntable metadata and reuse an outdated StatementContext, causing execution\nfailures or incorrect\n  table access.\n\nThis change tracks the table name captured during planning, checks\ndropped/renamed/schema-changed\ntables before reusing the short-circuit context, and reparses the\nprepared SQL to rebuild the\nprepared plan with a fresh StatementContext when the cached context is\nstale. Bound placeholder\nvalues and MySQL parameter types are preserved across refresh.\nRegression coverage is added for\n  rename, replace, and swap scenarios."
    },
    {
      "commit": "5b5b2ae1330e882eb34b4a4a4a627bfdb380dd70",
      "tree": "e2288b520aa23b9142436e4f06dd272d144f58a9",
      "parents": [
        "eeef49eafdc121df570f4ab7f1e1c380a35d9e53"
      ],
      "author": {
        "name": "Yixuan Wang",
        "email": "wangyixuan@selectdb.com",
        "time": "Mon Jun 01 11:54:38 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 11:54:38 2026 +0800"
      },
      "message": "[feat](cloud) Add system rate limit for meta-service (#61516)\n\n## Summary\n\nThis PR introduces an automatic rate limiting mechanism for the Meta\nService (MS) in Doris Cloud. When the Meta Service or its underlying\nFoundationDB (FDB) cluster is under heavy load, incoming RPC requests\nwill be proactively rejected with a `MS_TOO_BUSY` error code, preventing\ncascading failures and protecting system stability.\n\n## Motivation\n\nIn production environments, the Meta Service can become overwhelmed due\nto high concurrency, FDB cluster performance degradation, or resource\nexhaustion (CPU/memory). Without a self-protection mechanism, this can\nlead to cascading failures, elevated latencies, and potential\nsystem-wide outages. This change adds a multi-dimensional stress\ndetection system that automatically throttles requests when the system\nis under significant pressure.\n\n## Design\n\n### Stress Detection Dimensions\n\nThe rate limiter evaluates system stress across three independent\ndimensions, any of which can trigger rate limiting:\n\n1. **FDB Cluster Pressure** (`fdb_cluster_under_pressure`)\n- Triggered when FDB commit latency exceeds\n`ms_rate_limit_fdb_commit_latency_ms` (default: 50ms) **OR** FDB read\nlatency exceeds `ms_rate_limit_fdb_read_latency_ms` (default: 5ms)\n- **AND** the FDB `performance_limited_by` indicator reports a\nnon-workload bottleneck (e.g., storage server, log server)\n- This ensures rate limiting only kicks in when FDB itself is the\nbottleneck, not when the cluster is simply handling a normal high\nworkload\n\n2. **FDB Client Thread Pressure** (`fdb_client_thread_under_pressure`)\n- Uses a sliding window (default: 60 seconds) to compute the average FDB\nclient thread busyness percentage\n- Triggered when the window average exceeds\n`ms_rate_limit_fdb_client_thread_busyness_avg_percent` (default: 70%)\n**AND** the instantaneous busyness exceeds\n`ms_rate_limit_fdb_client_thread_busyness_instant_percent` (default:\n90%)\n- The dual-threshold (average + instant) design avoids false positives\nfrom transient spikes\n\n3. **MS Process Resource Pressure** (`ms_resource_under_pressure`)\n   - Monitors the Meta Service process\u0027s own CPU and memory usage\n- Triggered when CPU usage (both current and window average) exceeds\n`ms_rate_limit_cpu_usage_percent` (default: 95%) **OR** memory usage\n(both current and window average) exceeds\n`ms_rate_limit_memory_usage_percent` (default: 95%)\n- CPU usage is calculated via `getrusage()` delta over wall-clock time,\nnormalized by CPU core count\n- Memory usage is read from `/proc/self/status` (VmRSS) relative to\ntotal system memory via `sysinfo()`\n\n### Sliding Window Mechanism\n\n- A `MsStressDetector` class maintains a `std::deque\u003cWindowSample\u003e` of\nper-second samples\n- Each sample records: FDB client thread busyness, MS CPU usage, MS\nmemory usage\n- Samples outside the configured window (`ms_rate_limit_window_seconds`,\ndefault: 60s) are evicted\n- Window averages are only considered valid when the window is fully\npopulated (i.e., the time span of samples covers the full window)\n\n### Request Rejection Flow\n\n- The `RPC_PREPROCESS` macro in `meta_service_helper.h` is augmented\nwith rate limit checking logic\n- Before processing any RPC request, `get_ms_stress_decision()` is\ncalled to collect current metrics and evaluate stress\n- If `under_greate_stress()` returns true, the request is immediately\nrejected with `MetaServiceCode::MS_TOO_BUSY` (6002) and a detailed debug\nstring describing the trigger reason\n- On the BE side (`cloud_meta_mgr.cpp`), the `MS_TOO_BUSY` error code is\nrecognized and the error message is propagated\n\n### Fault Injection for Testing\n\n- A fault injection mechanism is included for testing rate limiting\nbehavior without actual system stress\n- Controlled by `enable_ms_rate_limit_injection` (default: false) and\n`ms_rate_limit_injection_probability` (default: 5%, range: 0-100)\n- When enabled, each request has a configurable probability of being\nartificially rate-limited\n- Uses thread-local `std::mt19937` random number generator for\nefficiency\n\n### FDB Performance Limited By Metric\n\n- A new bvar `g_bvar_fdb_performance_limited_by_name` is added to track\nthe FDB `performance_limited_by.name` field from the FDB status JSON\n- The value is mapped to: `0` if the limiter is \"workload\" (normal),\n`-1` otherwise (indicating an infrastructure bottleneck)\n- This metric is collected in `metric.cpp` via a new `get_string_value`\nlambda that parses the FDB status JSON\n\n## Configuration Parameters\n\n| Parameter | Type | Default | Description |\n|---|---|---|---|\n| `enable_ms_rate_limit` | Bool | `true` | Master switch for rate\nlimiting |\n| `enable_ms_rate_limit_injection` | mBool | `false` | Enable fault\ninjection for testing |\n| `ms_rate_limit_injection_probability` | mInt32 | `5` | Injection\nprobability (0-100%) |\n| `ms_rate_limit_window_seconds` | mInt64 | `60` | Sliding window size\nin seconds |\n| `ms_rate_limit_fdb_commit_latency_ms` | mInt64 | `50` | FDB commit\nlatency threshold (ms) |\n| `ms_rate_limit_fdb_read_latency_ms` | mInt64 | `5` | FDB read latency\nthreshold (ms) |\n| `ms_rate_limit_fdb_client_thread_busyness_avg_percent` | mInt64 | `70`\n| FDB client thread avg busyness threshold (%) |\n| `ms_rate_limit_fdb_client_thread_busyness_instant_percent` | mInt64 |\n`90` | FDB client thread instant busyness threshold (%) |\n| `ms_rate_limit_cpu_usage_percent` | mInt64 | `95` | MS process CPU\nusage threshold (%) |\n| `ms_rate_limit_memory_usage_percent` | mInt64 | `95` | MS process\nmemory usage threshold (%) |\n\nAll threshold parameters (prefixed with `m`) are mutable at runtime\nwithout restart.\n\n## Update rpc white list\nupdate list\n```\ncurl -X POST http://\u003cmeta-service-host\u003e:\u003cport\u003e/MetaService/http/set_rpc_rate_limit_whitelist \\\n-H \"Content-Type: application/json\" \\\n-d \u0027{\n  \"rpcs\": [\"commit_txn\", \"begin_txn\", \"get_txn\"]\n}\u0027\n```\nget list\n```\ncurl http://\u003cmeta-service-host\u003e:\u003cport\u003e/MetaService/http/get_rpc_rate_limit_whitelist\n{\n\"rpcs\": [\n  \"commit_txn\",\n  \"begin_txn\",\n  \"get_txn\"\n]\n}\n```\nunset\n```\ncurl -X POST http://\u003cmeta-service-host\u003e:\u003cport\u003e/MetaService/http/set_rpc_rate_limit_whitelist \\\n-H \"Content-Type: application/json\" \\\n-d \u0027{\"rpcs\": []}\u0027\n```"
    },
    {
      "commit": "eeef49eafdc121df570f4ab7f1e1c380a35d9e53",
      "tree": "62d0dade1d3ee06e42a365b102d2942649e0b882",
      "parents": [
        "ba86267294c1f0f93d01a06c7a8d26284280796b"
      ],
      "author": {
        "name": "hui lai",
        "email": "laihui@selectdb.com",
        "time": "Mon Jun 01 11:42:12 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 11:42:12 2026 +0800"
      },
      "message": "[enhance](job) add zero-row hint for Kafka read_committed load (#63664)\n\nWhen Kafka routine load is configured with\n`isolation.level\u003dread_committed`, the consumer may consume 0 rows while\nthe task partition lag is still positive. This can happen when upstream\nproducers use Kafka transactions and some records are not committed yet,\nso they are invisible to the read_committed consumer.\n\nThis PR adds an `OtherMsg` hint for this case:\n- routine load task consumes 0 rows\n- task partition lag is positive\n- Kafka property `isolation.level\u003dread_committed` is configured\n\nThe message helps users distinguish this case from ordinary no-data\nconsumption. A FE debug point and regression test are added to ensure\nthe warning can be reported deterministically."
    },
    {
      "commit": "ba86267294c1f0f93d01a06c7a8d26284280796b",
      "tree": "9cae3be40d214db60edb56d8999507fd188f025b",
      "parents": [
        "d1e30df5565214d2180c5a89e69f76e2a50fba16"
      ],
      "author": {
        "name": "yujun",
        "email": "yujun@selectdb.com",
        "time": "Mon Jun 01 11:31:04 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 11:31:04 2026 +0800"
      },
      "message": "[fix](regression) fix unstable test_audit_log_internal_query_failure due to other cases modify global vars (#63030)\n\nRelated PR: #62908"
    },
    {
      "commit": "d1e30df5565214d2180c5a89e69f76e2a50fba16",
      "tree": "33ded32da1d003ab03caa4193170a538b5a3e85f",
      "parents": [
        "7a79dd88a5f0b5c6e058f9c2f31d430174b83d07"
      ],
      "author": {
        "name": "deardeng",
        "email": "dengxin@selectdb.com",
        "time": "Mon Jun 01 11:14:21 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 11:14:21 2026 +0800"
      },
      "message": "[fix](cloud) Align colocate proc output and tablet health in cloud mode (#60944)\n\n- Fix incorrect tablet health statistics in cloud mode for SHOW PROC \u0027/cluster_health/tablet_health\u0027: avoid reporting UNRECOVERABLE due to local-mode health checks.\n- Add cloud fallback for colocation group detail: when backend sequence metadata is empty, derive backend ids from current tablets.\n- In cloud mode, show ReplicaAllocation as null in SHOW PROC \u0027/colocation_group/{GroupId}\u0027 for consistent output semantics.\n- Keep local mode behavior unchanged."
    },
    {
      "commit": "7a79dd88a5f0b5c6e058f9c2f31d430174b83d07",
      "tree": "40f951d5c31404ff62fc9e33ffbe247d8992e39b",
      "parents": [
        "17617be150e43fb749cf77cd55adc0b08a2f3afb"
      ],
      "author": {
        "name": "daidai",
        "email": "changyuwei@selectdb.com",
        "time": "Mon Jun 01 11:12:53 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 11:12:53 2026 +0800"
      },
      "message": "[fix](iceberg)fix iceberg v3 row lineage count distinct error result (#63826)\n\nCritical checkpoint conclusions:\n\nGoal and proof: The PR fixes Parquet Iceberg v3 row-lineage-only queries where _row_id is synthesized without reading physical table columns. The code now keeps the row positions generated inside _read_empty_batch instead of recomputing them after _total_read_rows has advanced, and the regression adds distinct, group by, count(distinct), and ndv coverage for _row_id-only reads.\nScope: The code change is small and focused on Parquet row-position preparation. The test addition is directly tied to the bug.\nConcurrency and lifecycle: No new shared state, threads, locks, static initialization, or lifecycle ownership changes were introduced.\nConfiguration and compatibility: No new config, persisted format, RPC, or FE/BE protocol compatibility changes.\nParallel code paths: The fix applies to the Parquet path where the bug is introduced by _read_empty_batch and _total_read_rows; the regression intentionally gates the aggregate-only check to Parquet.\nConditional checks: The new _need_current_batch_row_positions() helper preserves the existing synthesized/generated handler condition and removes duplicated condition logic.\nTest coverage: Added external Iceberg regression coverage for row-lineage-only aggregate/distinct reads before and after an insert. I did not run the regression locally in this GitHub Actions review session.\nObservability, transactions, persistence, data writes, metrics, and memory tracking: Not applicable to this change.\nUser focus: No additional user-provided focus points were supplied."
    },
    {
      "commit": "17617be150e43fb749cf77cd55adc0b08a2f3afb",
      "tree": "f1c0f472ed1c8e318910224c8174349d3e75e405",
      "parents": [
        "a1f66eb96f78a3ae46c03f8f32fdfce58c17ffd3"
      ],
      "author": {
        "name": "deardeng",
        "email": "dengxin@selectdb.com",
        "time": "Mon Jun 01 10:38:23 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 10:38:23 2026 +0800"
      },
      "message": "[fix] (cloud) Fix local/remote tablet size semantics in schema views (#60887)\n\nIn storage-compute separation, data size should be represented\nconsistently as remote size.\n    \nPreviously, show tablets and information_schema.partitions could diverge\nfrom information_schema.backend_tablets, which made local/remote\nsemantics confusing for users and operators.\n    \nThis change aligns cloud-mode output mapping for local/remote size\ncolumns and adds a regression test to guard the behavior."
    },
    {
      "commit": "a1f66eb96f78a3ae46c03f8f32fdfce58c17ffd3",
      "tree": "f5a4abc4ec2e57f509df31ca2cc055cff31a5e88",
      "parents": [
        "4a0c58bcbcdb2f257f64ef2a856d39d0186bf7cd"
      ],
      "author": {
        "name": "Socrates",
        "email": "suyiteng@selectdb.com",
        "time": "Mon Jun 01 10:19:31 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 10:19:31 2026 +0800"
      },
      "message": "[fix](fe) Keep cached file systems alive while in use (#63677)\n\n### What problem does this PR solve?\n\nIssue Number: close #xxx\n\nRelated PR: #xxx\n\nProblem Summary:\n\nHive file listing and Hive ACID state loading borrow `FileSystem`\ninstances from `FileSystemCache` and keep using them while checking\nsplitability, listing remote files, or loading ACID state.\n\nThe previous cache implementation returned the raw cached `FileSystem`.\nWhen a cache entry was evicted or expired, the Caffeine removal listener\nclosed that same instance immediately. If cache cleanup happened while\nanother thread was still using the returned instance, the active Hive\noperation could observe a closed filesystem.\n\nThis PR fixes the lifecycle in `FileSystemCache` instead of bypassing\nthe cache at Hive call sites. Cached filesystems are now returned\nthrough leases backed by a holder with an active reference count. Cache\neviction marks the holder as evicted, and the underlying filesystem is\nclosed only after the last active lease is released. If the filesystem\ncache is disabled, the direct lease owns the newly created filesystem\nand closes it when released. Hive file listing and ACID paths now use\ntry-with-resources to hold the lease for the whole filesystem usage\nwindow.\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test \u003c!-- At least one of them must be included. --\u003e\n    - [ ] Regression test\n    - [x] Unit Test\n- `./run-fe-ut.sh --run\norg.apache.doris.fs.FileSystemCacheTest,org.apache.doris.datasource.hive.HiveMetaStoreCacheTest`\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [ ] No need to test or manual test. Explain why:\n- [ ] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [ ] Other reason \u003c!-- Add your reason?  --\u003e\n\n- Behavior changed:\n    - [x] No.\n    - [ ] Yes. \u003c!-- Explain the behavior change --\u003e\n\n- Does this need documentation?\n    - [x] No.\n- [ ] Yes. \u003c!-- Add document PR link here. eg:\nhttps://github.com/apache/doris-website/pull/1214 --\u003e\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label \u003c!-- Add branch pick label that this PR\nshould merge into --\u003e"
    },
    {
      "commit": "4a0c58bcbcdb2f257f64ef2a856d39d0186bf7cd",
      "tree": "4429d7c85b53d35793cec4ef7b8e73b5f4b1ddcd",
      "parents": [
        "e0729979c710736f70c203e0724cb77e98667d81"
      ],
      "author": {
        "name": "Gavin Chou",
        "email": "gavin@selectdb.com",
        "time": "Mon Jun 01 10:10:37 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 10:10:37 2026 +0800"
      },
      "message": "[fix](cloud) Drain txn lazy committer workers before destruction (#63876)\n\n## What\nFix shutdown ordering in `TxnLazyCommitter` by explicitly stopping\nworker pools before member destruction can invalidate state used by\nworker callbacks.\n\n## Why\nLazy commit worker jobs keep a back pointer to `TxnLazyCommitter` and\ncall back into `remove()`. They can also access the parallel commit pool\nand resource manager during `commit()`. With the default destructor,\n`running_tasks_`, `mutex_`, and `parallel_commit_pool_` are destroyed\nbefore `worker_pool_` is joined, which can lead to shutdown-time\nuse-after-destruction.\n\n## How\n- Add an explicit `TxnLazyCommitter` destructor.\n- Mark the committer as stopped before draining workers.\n- Stop and join the lazy commit worker pool before destroying task\ntracking state.\n- Stop the parallel commit pool after lazy workers are quiesced.\n- Make failed or post-shutdown submissions complete with an error\ninstead of leaving waiters blocked.\n\n## Tests\n- `sh format_code.sh cloud/src/meta-service/txn_lazy_committer.h`\n- `sh format_code.sh cloud/src/meta-service/txn_lazy_committer.cpp`\n- `sh run-cloud-ut.sh --run --fdb\n\"fdb_cluster0:cluster0@10.26.20.4:4500\"`\n  - Build passed.\n  - `txn_lazy_commit_test` passed 24/24 in the full run.\n- The full run had unrelated storage vault/HDFS failures in\n`meta_service_test`.\n- After tightening the submit/shutdown race:\n- `sh run-cloud-ut.sh --run --fdb\n\"fdb_cluster0:cluster0@10.26.20.4:4500\" --filter\n\"txn_lazy_commit_test:*.*\"`\n- Build passed; 22/24 passed, 2 tests failed due FDB `Timeout` while\ncommitting setup transactions.\n\nCo-authored-by: gavinchou \u003cgavinchou@apache.org\u003e"
    },
    {
      "commit": "e0729979c710736f70c203e0724cb77e98667d81",
      "tree": "27ab0ad62266f873083bf9824efc4d55fd9d9c8e",
      "parents": [
        "487f783334648c60d982c49bdb315b01c29cdcc7"
      ],
      "author": {
        "name": "Chenyang Sun",
        "email": "sunchenyang@selectdb.com",
        "time": "Mon Jun 01 10:00:52 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 10:00:52 2026 +0800"
      },
      "message": "[refactor](be) remove CHAR padding on read  (#63291)\n\n- https://github.com/apache/doris-website/pull/3759/\n\n- Problem: The CHAR padding contract leaked from the storage layer into\nthe\ncompute / predicate layers — every scan stripped padding at the Block\nlevel,\nwhile predicates re-padded values to match the on-disk shape. Logic was\nspread\n   out and wasted work on every read.                                   \n- Fix: On-disk format unchanged. The convertor still pads CHAR to the\nschema\nlength on write, but the strip is pushed down to the page pre-decoder —\nthe\npage cache holds unpadded data. All shrink_* / pad_* code above the page\ncache\n   (SegmentIterator, Block, RowCursor, predicates) is removed.         \n- BloomFilter: BF probing is skipped (return true, fall back to scan)\nfor CHAR\npredicates — the BF hashes padded bytes but predicate values are\nunpadded, so\nthe probe would never match. Other indexes (ZoneMap / inverted / bitmap)\nare\n  unaffected.  \n\n### What problem does this PR solve?\n\nIssue Number: close #xxx\n\nRelated PR: #xxx\n\nProblem Summary:\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test \u003c!-- At least one of them must be included. --\u003e\n    - [x] Regression test\n    - [x] Unit Test\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [ ] No need to test or manual test. Explain why:\n- [ ] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [ ] Other reason \u003c!-- Add your reason?  --\u003e\n\n- Behavior changed:\n    - [ ] No.\n    - [ ] Yes. \u003c!-- Explain the behavior change --\u003e\n\n- Does this need documentation?\n    - [ ] No.\n- [ ] Yes. \u003c!-- Add document PR link here. eg:\nhttps://github.com/apache/doris-website/pull/3759--\u003e\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label \u003c!-- Add branch pick label that this PR\nshould merge into --\u003e\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "487f783334648c60d982c49bdb315b01c29cdcc7",
      "tree": "8b5c30240c0aad461f0b61dad323041f47c329b0",
      "parents": [
        "6e5198b7cea96adfc62c80e5089d76402b150c3d"
      ],
      "author": {
        "name": "linrrarity",
        "email": "linzhenqi@selectdb.com",
        "time": "Sat May 30 23:45:36 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sat May 30 23:45:36 2026 +0800"
      },
      "message": "[Enhancement](udf) Do not check file when inline code exists (#63906)\n\nBefore this change, even when inline code was present, the FE would\nstill attempt to parse and validate the `FILE` in the `CREATE FUNCTION`\nstatement. However, during subsequent execution, even if `FILE` is\nvalid, it would not be used. Therefore, when inline code is present, we\ncan omit checking the `FILE` field when creating the table.\n\nbefore\n```sql\nDROP FUNCTION IF EXISTS py_inline_file_udf(INT);\nCREATE FUNCTION py_inline_file_udf(INT)\nRETURNS INT\nPROPERTIES (\n  \"type\"\u003d\"PYTHON_UDF\",\n  \"file\"\u003d\"http://127.0.0.1:12345/non_existent.zip\",\n  \"symbol\"\u003d\"evaluate\",\n  \"runtime_version\"\u003d\"3.12.11\",\n  \"always_nullable\"\u003d\"true\"\n)\nAS $$\ndef evaluate(x):\n    if x is None:\n        return None\n    return x + 100\n$$;\n\nSELECT py_inline_file_udf(val) FROM t_repro ORDER BY id;\n-- errCode \u003d 2, detailMessage \u003d cannot to compute object\u0027s checksum.\n```\n\nnow\n```sql\nDoris\u003e SELECT py_inline_file_udf(val) FROM t_repro ORDER BY id;\n+-------------------------+\n| py_inline_file_udf(val) |\n+-------------------------+\n|                     110 |\n|                     120 |\n|                     130 |\n+-------------------------+\n```"
    },
    {
      "commit": "6e5198b7cea96adfc62c80e5089d76402b150c3d",
      "tree": "6d05f5d239cfc83dc94fba8b52a8e42a48c850bd",
      "parents": [
        "1b44c051649f185a8b3cfd6e2fd5c81ed1879083"
      ],
      "author": {
        "name": "zhiqiang",
        "email": "seuhezhiqiang@163.com",
        "time": "Sat May 30 21:48:04 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sat May 30 21:48:04 2026 +0800"
      },
      "message": "[test](regression) Add debug point ANN index-only scan test (#63859)\n\n### What problem does this PR solve?\n\nIssue Number: None\n\nRelated PR: None\n\nProblem Summary: The previous ANN index-only scan regression coverage\ninferred whether source vector columns were skipped by comparing\nScanBytes from query profiles. That made the test hard to review and\ncould miss cases where both query shapes still read the source column.\nReplace that coverage with a dedicated debug-point regression that\ndirectly fails if the embedding column is read in index-only scenarios,\nincluding a remapped reader-schema case where the source slot index\ndiffers from the storage column id. Remove the old profile-based suites\nand generated output.\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test: Manual test\n    - git diff --cached --check\n- Regression test not run per request; an earlier attempt was blocked by\nMaven writing to /Users/roanhe/.m2/repository under the sandbox\n- Behavior changed: No\n- Does this need documentation: No\n\n### What problem does this PR solve?\n\nIssue Number: close #xxx\n\nRelated PR: #xxx\n\nProblem Summary:\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test \u003c!-- At least one of them must be included. --\u003e\n    - [ ] Regression test\n    - [ ] Unit Test\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [ ] No need to test or manual test. Explain why:\n- [ ] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [ ] Other reason \u003c!-- Add your reason?  --\u003e\n\n- Behavior changed:\n    - [ ] No.\n    - [ ] Yes. \u003c!-- Explain the behavior change --\u003e\n\n- Does this need documentation?\n    - [ ] No.\n- [ ] Yes. \u003c!-- Add document PR link here. eg:\nhttps://github.com/apache/doris-website/pull/1214 --\u003e\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label \u003c!-- Add branch pick label that this PR\nshould merge into --\u003e"
    },
    {
      "commit": "1b44c051649f185a8b3cfd6e2fd5c81ed1879083",
      "tree": "f61b2ae17d8861f717efce04dc683180c139b563",
      "parents": [
        "c59abe09e606108b9269bca168804bdca24b266b"
      ],
      "author": {
        "name": "nooneuse",
        "email": "nooneuse@users.noreply.github.com",
        "time": "Sat May 30 18:36:41 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sat May 30 18:36:41 2026 +0800"
      },
      "message": "Add datasketches HLL sketch aggregate functions (#63143)\n\n### What problem does this PR solve?\n\n\u003e An aggregate function is required to process user data containing\nDatasketches HLL sketches. In many data aggregation scenarios, users\npre‑aggregate detailed data in Hive using the sketching techniques\nprovided by Apache Datasketches, and then analyze the resulting sketches\nacross various OLAP engines. Compared with the HLL union aggregate\nfunctions natively offered by these engines, there are two key diff to\nusing Datasketches HLL sketches: firstly, the use cases differ; and\nsecondly, HLL sketches can be used seamlessly across different\nengines—for example, simultaneously in ES, Doris, and ClickHouse. Such\nrequirements are common in many production environments.\n\nIssue Number: \n- #63142(https://github.com/apache/doris/issues/63142)\n- #26416\n- #56246\n\nSummary:\nImplemented a built-in aggregate function that integrates the\nDatasketches HLL sketch. This aggregate function cannot rely on the Java\nUDF environment. Considering that in the Java UDF environment, Strings\nare encoded in UTF-8, which corrupts the binary data of sketches, the\nserialization/deserialization operations for sketches must be\nimplemented on the BE side. (additionally, since Apache Datasketches has\nbeen added to the contrib directory via a git submodule, it will become\nvery easy to add other sketches such as theta sketch in the future.)\n\n**see**: https://github.com/apache/doris/issues/63142\n**use case**: see regression test \u0026\nhttps://github.com/apache/doris/issues/63142\n\n---------\n\nCo-authored-by: yuanyuhao \u003cyuanyuhao@bytedance.com\u003e"
    },
    {
      "commit": "c59abe09e606108b9269bca168804bdca24b266b",
      "tree": "6775094ba858a564f93c69991f68ba9b52787934",
      "parents": [
        "d7c033f7f34df6d1d5b5bad439cc0a61c9f19076"
      ],
      "author": {
        "name": "linrrarity",
        "email": "linzhenqi@selectdb.com",
        "time": "Sat May 30 14:32:59 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sat May 30 14:32:59 2026 +0800"
      },
      "message": "[Fix](FoldConst) Preserve NaN in numeric constant folding (#63870)\n\nProblem Summary:\n\nSome numeric functions returned different results between normal\nexecution and FE constant folding when the input was NaN. The shared\nboundary helper `inputOutOfBound()` treated `Double.NaN` as out of bound\nbecause range comparisons with NaN are false, causing functions such as\n`ln`, `log`, `log2`, `log10`, `dlog10`, and `power` to fold NaN cases\nincorrectly to NULL or skip folding.\n\nbefore\n```sql\nDoris\u003e set debug_skip_fold_constant\u003dtrue;\nDoris\u003e select log(2 ,cast(\u0027nan\u0027 as double));\n+-------------------------------+\n| log(2 ,cast(\u0027nan\u0027 as double)) |\n+-------------------------------+\n|                           NaN |\n+-------------------------------+\n1 row in set (0.033 sec)\n\nDoris\u003e set debug_skip_fold_constant\u003dfalse;\nQuery OK, 0 rows affected (0.009 sec)\n\nDoris\u003e select log(2 ,cast(\u0027nan\u0027 as double));\n+-------------------------------+\n| log(2 ,cast(\u0027nan\u0027 as double)) |\n+-------------------------------+\n|                          NULL |\n+-------------------------------+\n1 row in set (0.008 sec)\n```\n\nnow\n```sql\nDoris\u003e set debug_skip_fold_constant\u003dtrue;\nQuery OK, 0 rows affected (0.001 sec)\n\nDoris\u003e select log(2 ,cast(\u0027nan\u0027 as double));\n+-------------------------------+\n| log(2 ,cast(\u0027nan\u0027 as double)) |\n+-------------------------------+\n|                           NaN |\n+-------------------------------+\n1 row in set (0.018 sec)\n\nDoris\u003e set debug_skip_fold_constant\u003dfalse;\nQuery OK, 0 rows affected (0.001 sec)\n\nDoris\u003e select log(2 ,cast(\u0027nan\u0027 as double));\n+-------------------------------+\n| log(2 ,cast(\u0027nan\u0027 as double)) |\n+-------------------------------+\n|                           NaN |\n+-------------------------------+\n1 row in set (0.003 sec)\n```"
    },
    {
      "commit": "d7c033f7f34df6d1d5b5bad439cc0a61c9f19076",
      "tree": "bd19639d69de87796740911e13dfac86359454c7",
      "parents": [
        "3c9c40fa5fd3d19ec70804afafd911992967b568"
      ],
      "author": {
        "name": "linrrarity",
        "email": "linzhenqi@selectdb.com",
        "time": "Sat May 30 14:32:01 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sat May 30 14:32:01 2026 +0800"
      },
      "message": "[Fix](variance) Fix sample variance/stddev NaN res for single value (#63605)\n\nProblem Summary:\n\nFix `VAR_SAMP`, `VARIANCE_SAMP`, and `STDDEV_SAMP` to return `NaN` when\nthe number of valid input values is less than or equal to 1. Sample\nvariance/stddev are undefined for `n \u003c\u003d 1`, so returning `0.0` is\nmisleading.\n\nbefore:\n```sql\nCREATE TABLE t (id INT, v DOUBLE) DUPLICATE KEY(id) DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES(\u0027replication_num\u0027\u003d\u00271\u0027);\nINSERT INTO t VALUES (1, 5.0);  -- 单行\n\nSELECT VAR_SAMP(v), STDDEV_SAMP(v) FROM t;\n+-------------+----------------+\n| VAR_SAMP(v) | STDDEV_SAMP(v) |\n+-------------+----------------+\n|           0 |              0 |\n+-------------+----------------+\n```\n\nnow:\n```sql\nSELECT VAR_SAMP(v), STDDEV_SAMP(v) FROM t;\n+-------------+----------------+\n| VAR_SAMP(v) | STDDEV_SAMP(v) |\n+-------------+----------------+\n|         NaN |            NaN |\n+-------------+----------------+\n```\n\ndoc: https://github.com/apache/doris-website/pull/3765"
    },
    {
      "commit": "3c9c40fa5fd3d19ec70804afafd911992967b568",
      "tree": "bceb1ce7e901a4442125a0d789a81b1f01a78806",
      "parents": [
        "477cb0c6bbaad03e7e65b4589560a3b8a1bf659a"
      ],
      "author": {
        "name": "Wen Zhenghu",
        "email": "wenzhenghu.zju@gmail.com",
        "time": "Sat May 30 08:28:03 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sat May 30 08:28:03 2026 +0800"
      },
      "message": "[fix](fe) Fix broken pipe risk on stream load redirect with unconsumed request body (#63332)\n\n### What problem does this PR solve?\n\nIssue Number: close #63325\n\nProblem Summary:\n\n**Problem**\n- Starting from Doris `3.1.3`, FE uses `Jetty 12`, and this introduced a\ncompatibility change in the Stream Load redirect path.\n- When a Stream Load request is sent to FE, FE may return `307 Temporary\nRedirect` before the request body is fully consumed. Under `Jetty 12`,\nthis behavior is more likely to cause early connection close or reset\nwhile the client is still writing the request body.\n- As a result, some `HTTP/1.1` streaming clients may observe errors such\nas `BrokenPipeError` or `ConnectionResetError` when sending Stream Load\nrequests through FE.\n- The problem is more visible with chunked uploads, higher network\nlatency, and clients that continue sending request body data before\nfully processing the redirect response.\n- In short, this is a compatibility regression introduced by the `Jetty\n12` upgrade in Doris `3.1.3` and later.\n\n**Fix**\n- We keep the existing FE-to-BE redirect architecture unchanged, so FE\nstill redirects Stream Load requests to BE instead of proxying the full\nrequest body.\n- We add a bounded request-body drain step on the FE Stream Load\nredirect path:\n  - FE first writes the `307 Temporary Redirect` response.\n- FE then drains and discards only a bounded amount of the remaining\nrequest body.\n- This provides a small compatibility window for in-flight client writes\nand reduces the chance of early connection reset.\n- We also apply the same handling to token-authenticated Stream Load\nrequests, so both password-authenticated and token-authenticated paths\nbehave consistently.\n- In addition, we expose Jetty\u0027s unconsumed request content read setting\nthrough FE configuration and apply it to HTTP connectors, so operators\ncan tune Jetty behavior for redirect scenarios where the request body is\nnot fully consumed.\n- To make the compatibility path effective out of the box, this PR also\nenables the bounded drain path by default with a `1GB` drain limit and a\n`1000ms` idle wait window.\n\n**New Configurations**\n- `jetty_server_max_unconsumed_request_content_reads`\n- Controls how many extra reads Jetty performs for unconsumed request\ncontent.\n- `-1` means unlimited, `0` disables extra reads, and a positive value\nsets the maximum number of read attempts.\n  - Default value in this PR: `-1`.\n- This helps tune Jetty behavior after the `Jetty 12` upgrade when FE\nreturns a response before the request body is fully consumed.\n\n- `stream_load_redirect_bounded_drain_max_bytes`\n- Controls the maximum number of request body bytes FE drains after\nreturning `307` for a Stream Load redirect.\n  - `0` disables this compatibility logic.\n- A positive value enables bounded draining and limits how much data FE\nwill discard.\n  - Default value in this PR: `1GB`.\n\n- `stream_load_redirect_bounded_drain_max_idle_time_ms`\n- Controls how long FE waits for more readable request body data during\nthe bounded drain process.\n  - `0` disables the extra idle wait.\n- A positive value provides a small grace window for slow clients or\ndelayed body chunks, helping absorb in-flight writes without keeping the\nconnection open indefinitely.\n  - Default value in this PR: `1000ms`.\n\n**Test Result / Validation**\n- Verified the behavior with the same Python `HTTP/1.1` chunked Stream\nLoad reproduction used during issue analysis.\n- Reproduced requests were sent to FE with `Expect: 100-continue`,\n`Transfer-Encoding: chunked`, and paced body streaming to maximize the\nredirect race window.\n- Baseline validation on Doris `3.0` (`9030` / FE `8030`):\n  - `payload_mb\u003d1`, `chunk_kb\u003d1`, `sleep_ms\u003d0`\n  - `payload_mb\u003d8`, `chunk_kb\u003d16`, `sleep_ms\u003d10`\n  - Both requests returned normal `307 Temporary Redirect`.\n- Validation on the fixed Doris `3.1.4` instance (`9034` / FE `8034`):\n- Before enabling the bounded drain config, the same reproduction still\ntriggered `BrokenPipeError`.\n  - After enabling the FE configs below:\n    - `jetty_server_max_unconsumed_request_content_reads \u003d -1`\n    - `stream_load_redirect_bounded_drain_max_bytes \u003d 16777216`\n    - `stream_load_redirect_bounded_drain_max_idle_time_ms \u003d 1000`\n- The same two reproduction requests both returned normal `307 Temporary\nRedirect`.\n- No `BrokenPipeError` or `ConnectionResetError` was observed after the\nconfig took effect.\n- The PR now further updates the default bounded drain byte limit from\n`16MB` to `1GB`, while keeping the default idle wait at `1000ms`, so the\ncompatibility path is enabled by default with a more generous drain\nwindow.\n\n**Performance Validation**\n- I compared FE redirect response time between the Doris `3.0` baseline\ninstance (`9030`) and the fixed Doris `3.1.4` instance (`9034`).\n- The goal was to check whether the additional bounded drain logic on FE\nintroduces a noticeable regression compared with the original Jetty 9\nbehavior.\n\n**Test Setup**\n- Reproduction tool: `tools/stream_load_redirect_repro.py`\n- Target: FE endpoint on both instances\n- Client mode: `httpclient`\n- Common parameters:\n  - `chunk_kb \u003d 16`\n  - `sleep_ms \u003d 0`\n- Payload sizes:\n  - `32MB`\n  - `128MB`\n  - `512MB`\n- Each case was executed `3` times on each instance, and the average\n`elapsed_seconds` was used for comparison.\n\n**Results**\n- `32MB`\n  - `9030`: `9.515s / 12.032s / 9.727s`\n  - Average: `10.425s`\n  - `9034`: `10.695s / 10.847s / 8.763s`\n  - Average: `10.102s`\n  - Difference: `9034` was `0.323s` faster, about `3.1%`\n\n- `128MB`\n  - `9030`: `37.174s / 34.111s / 37.090s`\n  - Average: `36.125s`\n  - `9034`: `38.910s / 36.423s / 38.337s`\n  - Average: `37.890s`\n  - Difference: `9034` was `1.765s` slower, about `4.9%`\n\n- `512MB`\n  - `9030`: `157.181s / 161.148s / 174.421s`\n  - Average: `164.250s`\n  - `9034`: `172.310s / 176.692s / 160.068s`\n  - Average: `169.690s`\n  - Difference: `9034` was `5.440s` slower, about `3.3%`\n\n**Conclusion**\n- Across all tested payload sizes, the difference between `9030` and\n`9034` stayed within a small range, roughly `-3%` to `+5%`.\n- Based on these measurements, the FE bounded drain logic does not show\na significant performance regression compared with the baseline FE\nredirect behavior.\n- In other words, the fix improves redirect compatibility while keeping\nFE redirect response time at a similar level in normal request sizes.\n---------\n\nCo-authored-by: yaoxiao \u003cyx136264032@163.com\u003e"
    },
    {
      "commit": "477cb0c6bbaad03e7e65b4589560a3b8a1bf659a",
      "tree": "7b028c118f7d50dfabeab3d50c72ce652c2f7bb5",
      "parents": [
        "a7ad76ae5704536aa70f400d6d9bbad2be68bc50"
      ],
      "author": {
        "name": "Jerry Hu",
        "email": "hushenggang@selectdb.com",
        "time": "Fri May 29 19:11:17 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 19:11:17 2026 +0800"
      },
      "message": "[improvement](be) Add release-enabled Doris check macros (#63730)\n\n### What problem does this PR solve?\n\nIssue Number: None\n\nRelated PR: None\n\nProblem Summary: Add a dedicated `common/check.h` header for Doris check\nmacros. `DORIS_CHECK` accepts streamed context through the usual `\u003c\u003c`\nsyntax while avoiding evaluation of streamed operands on successful\nchecks. `DORIS_CHECK_EQ/NE/LT/LE/GT/GE` are intended for invariants that\nshould remain checked in Release builds: Debug builds map them to the\ncorresponding `DCHECK_*` macros, while Release builds evaluate each\noperand once, compare with the requested operator, and throw through the\nexisting `DORIS_CHECK`-style fatal error path with a message that\nincludes both compared expressions and their actual values. Release\ncomparison checks also accept streamed context. `status.h` re-exports\n`common/check.h` to keep existing includes compatible. The JSONB\nfunction call sites that rely on these invariants are switched from\n`DCHECK` to the new release-enabled Doris checks. The added\n`DorisCheckTest` coverage exercises `check.cpp` failure handling,\n`check.h` value formatting helpers, binary-op result formatting,\nstreamed messages, stream-operand laziness, comparison success, and\nsingle-evaluation behavior.\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test: Unit Test / Manual test\n- `build-support/clang-format.sh\nbe/src/exprs/function/function_jsonb.cpp be/test/common/check_test.cpp`\n- `DORIS_HOME\u003d$PWD ninja -C be/ut_build_ASAN\nsrc/exprs/CMakeFiles/Exprs.dir/function/function_jsonb.cpp.o\ntest/CMakeFiles/doris_be_test.dir/common/check_test.cpp.o`\n    - `./run-be-ut.sh --run --filter\u003dDorisCheckTest.*`\n    - `build-support/check-format.sh`\n    - `git diff --cached --check`\n- Behavior changed: No\n- Does this need documentation: No"
    },
    {
      "commit": "a7ad76ae5704536aa70f400d6d9bbad2be68bc50",
      "tree": "3f2df82d0f216f1db58ac6919aeec76ff38400ba",
      "parents": [
        "148429f699e2b7eb5f3ebf9cdbcab9e375fb839a"
      ],
      "author": {
        "name": "Jerry Hu",
        "email": "hushenggang@selectdb.com",
        "time": "Fri May 29 18:42:36 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 18:42:36 2026 +0800"
      },
      "message": "[fix](be) Preserve null probe rows in mark anti join (#63767)\n\n### What problem does this PR solve?\n\nIssue Number: None\n\nRelated PR: None\n\nProblem Summary: Correlated `NOT IN` subqueries under disjunction can be\nrewritten to a mark null-aware left anti join with additional join\nconjuncts. When the probe join key is `NULL`, the hash table lookup\nadvanced the probe index before the caller could run the null-probe\nhandling path. As a result, the probe row was skipped before the mark\ncolumn was evaluated by the outer disjunction, producing incomplete\nquery results. This change keeps the probe index on the `NULL` row so\nthe null-aware join path can emit the correct mark value.\n\n### Release note\n\nFix incorrect results for correlated `NOT IN` subqueries combined with\ndisjunctions.\n\n### Check List (For Author)\n\n- Test:\n- Regression test: `doris-local-regression.sh --network 10.26.20.3/24\nrun -d correctness -s test_subquery_in_disjunction -forceGenOut`\n- Regression test: `doris-local-regression.sh --network 10.26.20.3/24\nrun -d correctness -s test_subquery_in_disjunction`\n- Manual test: verified the `NOT IN` + `OR` reproducer before and after\nthe fix on a local FE/BE cluster\n    - Build: `./build.sh --be`\n- Behavior changed: Yes. Corrects query result semantics for affected\nnull-aware mark anti joins.\n- Does this need documentation: No"
    },
    {
      "commit": "148429f699e2b7eb5f3ebf9cdbcab9e375fb839a",
      "tree": "7f1670060ef376f448fda6e85256821e73734603",
      "parents": [
        "7fc8e276284eac8a0155b90155f54e0e9723c0a6"
      ],
      "author": {
        "name": "Chenyang Sun",
        "email": "sunchenyang@selectdb.com",
        "time": "Fri May 29 18:05:36 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 18:05:36 2026 +0800"
      },
      "message": "[fix](test) Wait for target rowset count in test_time_series_compaction_policy (#63890)\n\n### What problem does this PR solve?\n\nIssue Number: close #xxx\n\nRelated PR: #xxx\n\nProblem Summary:\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test \u003c!-- At least one of them must be included. --\u003e\n    - [ ] Regression test\n    - [ ] Unit Test\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [ ] No need to test or manual test. Explain why:\n- [ ] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [ ] Other reason \u003c!-- Add your reason?  --\u003e\n\n- Behavior changed:\n    - [ ] No.\n    - [ ] Yes. \u003c!-- Explain the behavior change --\u003e\n\n- Does this need documentation?\n    - [ ] No.\n- [ ] Yes. \u003c!-- Add document PR link here. eg:\nhttps://github.com/apache/doris-website/pull/1214 --\u003e\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label \u003c!-- Add branch pick label that this PR\nshould merge into --\u003e\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "7fc8e276284eac8a0155b90155f54e0e9723c0a6",
      "tree": "e015dbfe9ce4e2c00cc44641dece570a74a85ad0",
      "parents": [
        "6e27f117471f481e13cebabe0454dddc60e5245c"
      ],
      "author": {
        "name": "Mingyu Chen (Rayner)",
        "email": "yunyou@selectdb.com",
        "time": "Fri May 29 14:30:16 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 14:30:16 2026 +0800"
      },
      "message": "[feat](sql-parser) Split SQL grammar into standalone fe-sql-parser (#63823)\n\n## Summary\n\nSplit SQL syntax parsing out of `fe-core` into a new `fe-sql-parser`\nmodule that produces an ANTLR parse tree (CST) without semantic\nanalysis. The new module can be packaged as an independent jar for\nexternal consumers (third-party tools, linters, format converters, etc.)\nwithout dragging in `LogicalPlan`, `Catalog`, `ConnectContext`, or any\nother fe-core internals.\n\n### Module changes\n\n- Move `DorisLexer.g4` / `DorisParser.g4` and 8 supporting Java files\n(`CaseInsensitiveStream`, `Origin`, `ParserUtils`, `ParseErrorListener`,\n`PostProcessor`, `ParseException`, `SyntaxParseException`,\n`QueryParsingErrors`) into `fe-sql-parser`. Package names are preserved\nso fe-core\u0027s hundreds of imports do not move.\n- `ParseException` now extends `RuntimeException` directly to break the\nchain through `nereids.exceptions.AnalysisException`, which references\n`LogicalPlan`.\n- Introduce an `OriginAware` SPI in `fe-sql-parser` so `ParserUtils`\nkeeps its per-thread field-based fast path (originally added in #52125).\n`MoreFieldsThread` in fe-core implements the interface; threads that\ndon\u0027t fall back to ThreadLocal — correctness is identical either way.\n- Add `org.apache.doris.sqlparser.DorisSqlParser` facade with\n`parseStatement` / `parseStatements` / `parseExpression`.\n- fe-core reverse-depends on fe-sql-parser; its own\n`antlr4-maven-plugin` now only processes the Nereids pattern-generator\u0027s\n`JavaLexer.g4` / `JavaParser.g4`.\n\nThe new module\u0027s only runtime dependency is `org.antlr:antlr4-runtime`.\n\n### Standalone CLI\n\n`mvn -pl fe-sql-parser -Pcli package` produces a self-contained\nexecutable jar (`fe-sql-parser-*-cli.jar`, ~1.7 MB after minimize-shade)\nso the parser can be invoked directly from a shell:\n\n```bash\n$ java -jar fe-sql-parser-*-cli.jar \"SELECT a FROM t WHERE a \u003e 1\"\n(singleStatement (statement ...) \u003cEOF\u003e)\n\n$ java -jar fe-sql-parser-*-cli.jar --pretty --multi \"USE db; SELECT 1; SELECT 2\"\n$ java -jar fe-sql-parser-*-cli.jar --expression \"a + 1 * COALESCE(b, 0)\"\n$ echo \"SELECT 1\" | java -jar fe-sql-parser-*-cli.jar\n$ java -jar fe-sql-parser-*-cli.jar -f query.sql\n```\n\nThe CLI is gated behind the `cli` Maven profile so default Doris builds\ndo not pay the shading cost; the thin jar consumed by fe-core is\nunchanged. Exit codes: `0` success, `1` parse error, `2` usage or I/O\nerror.\n\n### Extension hooks for downstream tools\n\nDownstream projects can plug in custom logic (SQL lineage, policy\nenforcement, audit, rewriting, metrics) **without modifying\n`fe-sql-parser`**. Four mechanisms are available:\n\n| Mechanism | When it fires | Typical use |\n|-----------|---------------|-------------|\n| Subclass `DorisParserBaseVisitor\u003cT\u003e` | After parsing | Extract\ninformation, rewrite, lineage |\n| Subclass `DorisParserBaseListener` | After parsing | Simple\n`enter`/`exit` interception |\n| `parser.addParseListener(...)` via `newLexer` / `newParser` | Live,\nduring parsing | Token-level processing, on-the-fly mutation |\n| Wrap `DorisSqlParser` | Around the call | Metrics, caching,\nrequest-level policy |\n\n`fe/fe-sql-parser/README.md` contains end-to-end examples for SQL\nlineage extraction, policy/audit listeners, hint collection during\nparsing, an instrumented facade with caching and metrics, plus tips on\nlocating rule names and debugging visitors with the CLI. It also\ndocuments the build modes, library/Visitor usage, configuration flags\n(`noBackslashEscapes`, `ansiSqlSyntax`), the `OriginAware` fast-path\nSPI, and current caveats.\n\n## Test plan\n\n- [x] `fe-sql-parser` unit tests: 7 new cases in `DorisSqlParserTest`\ncovering `SELECT`, `SELECT FROM WHERE`, multi-statement, expression, DDL\n(`CREATE TABLE` with `DISTRIBUTED` + `PROPERTIES`), malformed SQL, and\ntrailing-garbage expressions\n- [x] Full `fe` reactor compiles (`mvn -pl fe-core -am compile`)\n- [x] fe-core\u0027s 15 existing `org.apache.doris.nereids.parser.*Test`\nclasses pass (160 cases, 0 failures)\n- [x] CLI smoke test: positional / `-e` / `-f` / stdin input modes;\n`--multi` / `--expression` parse modes; `--pretty` output; parse-error\nexit code\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "6e27f117471f481e13cebabe0454dddc60e5245c",
      "tree": "7d07ceed405016c91fcc2959fe5732e71ee82bf9",
      "parents": [
        "aa68e4bd9e74dc84eaeb3860058ba818e2081abb"
      ],
      "author": {
        "name": "wudi",
        "email": "wudi@selectdb.com",
        "time": "Fri May 29 11:39:11 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 11:39:11 2026 +0800"
      },
      "message": "[improve](streaming-job) avoid potential OOM when reading large snapshot splits (#63833)\n\n## Summary\n- Default-skip flink-cdc\u0027s in-snapshot backfill on the from-to path so\nlarge splits no longer accumulate the entire chunk + backfill stream in\nthe fetcher\u0027s outputBuffer; from-to is at-least-once and tolerates the\nduplicates this introduces. TVF (job-driven and standalone) keeps the\nstandard `false` default for exactly-once via per-task offset commit.\n- Expose `skip_snapshot_backfill` as a user-facing property with strict\n`true`/`false` validation on both from-to (CREATE JOB) and TVF (SELECT\nFROM cdc_stream(...)) entry points.\n- Fix snapshot completion under `pollWithoutBuffer`: a split is now\nmarked complete only after its high-watermark event has been consumed\n(`splitState.getHighWatermark() !\u003d null`), not on the first non-empty\nfetcher batch. Without this, enabling the new default truncates any\nsplit larger than debezium\u0027s `max.batch.size` and yields an NPE on\noffset extraction.\n- Read `streaming_task_timeout_multiplier` live in\n`StreamingMultiTblTask.isTimeout()` so `admin set frontend config`\naffects already-running tasks, matching the `@ConfField(mutable\u003dtrue)`\ncontract."
    },
    {
      "commit": "aa68e4bd9e74dc84eaeb3860058ba818e2081abb",
      "tree": "fc05d38e5b6e936e0fccb06eacc8c9f657e4e25a",
      "parents": [
        "1e8e91dbee17a06c3885b1ba174741b6c4eef0d9"
      ],
      "author": {
        "name": "linrrarity",
        "email": "linzhenqi@selectdb.com",
        "time": "Fri May 29 11:34:39 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 11:34:39 2026 +0800"
      },
      "message": "[Enhancement](udf) Reject bitmap, hll, and quantile_state in udf create (#63849)\n\nProblem Summary:\n\nUDF creation currently allows `BITMAP`, `HLL`, and `QUANTILE_STATE` in\nfunction signatures, but these object types are not exposed to\nJava/Python UDF runtimes as first-class values. They are effectively\nbridged as opaque bytes, and marked unsupported in\n[doc](https://doris.apache.org/docs/dev/query-data/udf/python-user-defined-function#data-type-mapping)"
    },
    {
      "commit": "1e8e91dbee17a06c3885b1ba174741b6c4eef0d9",
      "tree": "05512dd5356b6881d856961a02dbc3b11ea206a1",
      "parents": [
        "113fd2da3424f81c41e3ec2ed427074a56d2cf3f"
      ],
      "author": {
        "name": "Yixuan Wang",
        "email": "wangyixuan@selectdb.com",
        "time": "Fri May 29 11:24:57 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 11:24:57 2026 +0800"
      },
      "message": "[fix](recycler) Reduce recycle_job_lease_expired_ms for SnapshotDataMigrator (#63388)"
    },
    {
      "commit": "113fd2da3424f81c41e3ec2ed427074a56d2cf3f",
      "tree": "9d702fc760f15fea42db31276c67e3e5eb9dfbec",
      "parents": [
        "2570dd88f628eb81e931bd4a5d3c68d3b224fee1"
      ],
      "author": {
        "name": "收集群风",
        "email": "xdj483829269@163.com",
        "time": "Fri May 29 11:07:46 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 11:07:46 2026 +0800"
      },
      "message": "Add levenshtein and hamming_distance functions (#60412)\n\nRelated Issue: #48203\nRelated PR: #57144 (reference)\nProblem Summary: support levenshtein (Hive) and hamming_distance\n(Trino/Presto)."
    },
    {
      "commit": "2570dd88f628eb81e931bd4a5d3c68d3b224fee1",
      "tree": "ae77c5a6ff42b9118eceded30e27dd1cee0afa86",
      "parents": [
        "2ddf97a1a38707268e7ca4e267c8c6b293ddbdf4"
      ],
      "author": {
        "name": "yujun",
        "email": "yujun@selectdb.com",
        "time": "Fri May 29 11:07:38 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 11:07:38 2026 +0800"
      },
      "message": "[fix](fe) Remove decimal literal debug logs (#63841)\n\n### What problem does this PR solve?\n\nDecimal literal cast and construction paths print high-frequency INFO\ndebug logs. In FE runtime these logs can be routed through stderr and\nappear as ERROR lines, flooding FE logs during normal query traffic.\n\nThe noisy logs were introduced by #50940.\n\nThis PR removes the debug logs from the decimal literal hot paths.\n\n### Check List\n\n- Test: Unit Test\n  - LiteralTest\n- Behavior changed: No\n- Does this need documentation: No"
    },
    {
      "commit": "2ddf97a1a38707268e7ca4e267c8c6b293ddbdf4",
      "tree": "be79e069b50401ad401d453956de95fae5476547",
      "parents": [
        "9688e57f280da153268779603f0470c5c340960d"
      ],
      "author": {
        "name": "Qi Chen",
        "email": "chenqi@selectdb.com",
        "time": "Fri May 29 10:55:29 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 10:55:29 2026 +0800"
      },
      "message": "[fix](ann-index) Fix ANN range search state leakage and incorrect slot index tracking. (#63666)\n\nIssue Number: close #xxx\n\nRelated PR: #xxx\n\nProblem Summary:\n\n### Release note\n\nANN range search execution state was stored on shared VExpr roots.\nVExprContext clones share the root expression, so a segment that\nexecuted ANN range search could leak that state into another segment\nwithout an ANN index and incorrectly remove the common expression. ANN\nrange search also mixed schema column indexes with storage column ids\nwhen updating common expression index status, so remapped schemas failed\nto mark the source slot expression as evaluated. This patch returns ANN\nexecution state through the current evaluation call, stores ANN root\nbitmap in the current segment IndexContext, and updates slot index\nstatus by source column index.\n\n### Check List (For Author)\n\n- Test \u003c!-- At least one of them must be included. --\u003e\n    - [ ] Regression test\n    - [ ] Unit Test\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [ ] No need to test or manual test. Explain why:\n- [ ] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [ ] Other reason \u003c!-- Add your reason?  --\u003e\n\n- Behavior changed:\n    - [ ] No.\n    - [ ] Yes. \u003c!-- Explain the behavior change --\u003e\n\n- Does this need documentation?\n    - [ ] No.\n- [ ] Yes. \u003c!-- Add document PR link here. eg:\nhttps://github.com/apache/doris-website/pull/1214 --\u003e\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label \u003c!-- Add branch pick label that this PR\nshould merge into --\u003e"
    },
    {
      "commit": "9688e57f280da153268779603f0470c5c340960d",
      "tree": "abb5a3e694eac5958623baebbad6aaa3ca4f92f4",
      "parents": [
        "4f1dcdf33956a8adaf956ebca46da2820abf3ffb"
      ],
      "author": {
        "name": "deardeng",
        "email": "dengxin@selectdb.com",
        "time": "Fri May 29 10:54:37 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 10:54:37 2026 +0800"
      },
      "message": "[fix](cloud) normalize SHOW PARTITIONS display for storage and replica (#60871)\n\nIn cloud mode, SHOW PARTITIONS now displays StorageMedium as\nOBJECT_STORAGE and ReplicaAllocation as \u003cnull\u003e. Also add\nPartitionsProcDirTest to cover cloud/non-cloud display behavior.\n\n\u003cimg width\u003d\"462\" height\u003d\"312\" alt\u003d\"image\"\nsrc\u003d\"https://github.com/user-attachments/assets/f5ac8ab8-3ffd-468c-a2ea-9a957e7e385a\"\n/\u003e"
    },
    {
      "commit": "4f1dcdf33956a8adaf956ebca46da2820abf3ffb",
      "tree": "e2e60fe34302130572debe39eeea75a8d5340b5a",
      "parents": [
        "b653831c9fc7ad6a182b4bdfdc028c0134448c59"
      ],
      "author": {
        "name": "Chenyang Sun",
        "email": "sunchenyang@selectdb.com",
        "time": "Fri May 29 10:48:33 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 10:48:33 2026 +0800"
      },
      "message": "[refactor](BE) split EncodingInfo defaults into 4 explicit maps (#63622)\n\nReplace the EncodingPreference + runtime hook machinery in\nEncodingInfoResolver with four explicit maps and four matching get\nmethods:\n\n  - _v2_default_map     -\u003e get_v2_default_encoding(type)\n  - _v3_default_map         -\u003e get_v3_default_encoding(type)\n  - _index_column_default_map -\u003e get_index_column_encoding(type)\n  - _encoding_map           -\u003e get(type, encoding, out)\n\nNo on-disk format change; the resolved encodings written into\nColumnMetaPB match the pre-refactor outputs for both v2 and V3 tablets.\n\n### What problem does this PR solve?\n\nIssue Number: close #xxx\n\nRelated PR: #xxx\n\nProblem Summary:\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test \u003c!-- At least one of them must be included. --\u003e\n    - [ ] Regression test\n    - [x] Unit Test\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [ ] No need to test or manual test. Explain why:\n- [x] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [ ] Other reason \u003c!-- Add your reason?  --\u003e\n\n- Behavior changed:\n    - [ ] No.\n    - [ ] Yes. \u003c!-- Explain the behavior change --\u003e\n\n- Does this need documentation?\n    - [ ] No.\n- [ ] Yes. \u003c!-- Add document PR link here. eg:\nhttps://github.com/apache/doris-website/pull/1214 --\u003e\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label \u003c!-- Add branch pick label that this PR\nshould merge into --\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "b653831c9fc7ad6a182b4bdfdc028c0134448c59",
      "tree": "96d2a97ac1a28d2b70393b2cb878ebaae61c6cc6",
      "parents": [
        "99691f6895d6d3f527ced114fb225a742888e4b4"
      ],
      "author": {
        "name": "Mryange",
        "email": "yanxuecheng@selectdb.com",
        "time": "Fri May 29 10:42:08 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 10:42:08 2026 +0800"
      },
      "message": "[fix](function) deduplicate map keys after string-to-map cast (#63713)\n\n### What problem does this PR solve?\n\n\nProblem Summary:\nCasting a JSON string with duplicated object keys to MAP kept all\nduplicated entries because the string-to-complex cast path returned the\ngeneric wrapper directly and skipped ColumnMap::deduplicate_keys(). This\nmade string-to-map casts inconsistent with MAP constructor semantics\nwhere the last value wins.\n\nReproduction SQL:\n\n```sql\nSELECT CAST(\u0027{\"a\":1,\"a\":2}\u0027 AS MAP\u003cSTRING,INT\u003e);\nSELECT size(CAST(\u0027{\"a\":1,\"a\":2}\u0027 AS MAP\u003cSTRING,INT\u003e));\nSELECT element_at(CAST(\u0027{\"a\":1,\"a\":2}\u0027 AS MAP\u003cSTRING,INT\u003e), \u0027a\u0027);\n\nSELECT CAST(\u0027{\"outer\":{\"a\":1,\"a\":2}}\u0027 AS MAP\u003cSTRING, MAP\u003cSTRING, INT\u003e\u003e);\nSELECT element_at(element_at(CAST(\u0027{\"outer\":{\"a\":1,\"a\":2}}\u0027 AS MAP\u003cSTRING, MAP\u003cSTRING, INT\u003e\u003e), \u0027outer\u0027), \u0027a\u0027);\n\nSELECT map(\u0027a\u0027,1,\u0027a\u0027,2);\nSELECT size(map(\u0027a\u0027,1,\u0027a\u0027,2));\nSELECT element_at(map(\u0027a\u0027,1,\u0027a\u0027,2), \u0027a\u0027);\n```\n\nBefore this fix:\n\n```text\n{\"a\":1, \"a\":2}\n2\n1\n\n{\"outer\":{\"a\":1, \"a\":2}}\n1\n\n{\"a\":2}\n1\n2\n```\n\nAfter this fix:\n\n```text\n{\"a\":2}\n1\n2\n\n{\"outer\":{\"a\":2}}\n2\n\n{\"a\":2}\n1\n2\n```"
    },
    {
      "commit": "99691f6895d6d3f527ced114fb225a742888e4b4",
      "tree": "0fb75d89ee0c8373880dbe4701c0df54c76c485e",
      "parents": [
        "87316004891ab8d32f107b353c48bc2b65625425"
      ],
      "author": {
        "name": "Mryange",
        "email": "yanxuecheng@selectdb.com",
        "time": "Fri May 29 10:39:25 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 10:39:25 2026 +0800"
      },
      "message": "[refine](function) use typed ANN query vector (#63834)\n\nANN query vector extraction returned a generic `IColumn::Ptr`, so the\nTopN and range search paths had to downcast the column again before\nreading float data. This made the code more indirect and delayed type\nvalidation. This PR changes the helper and runtime state to keep the\nquery vector as `ColumnFloat32::Ptr`, validates the concrete type at\nextraction time, and removes redundant casts from the ANN execution\npath."
    },
    {
      "commit": "87316004891ab8d32f107b353c48bc2b65625425",
      "tree": "5f1a91c43c1c9677cc015f10dcb5601b4d2a4871",
      "parents": [
        "c27fef6ac08ee7b79946066e0d2c8864689f6c2f"
      ],
      "author": {
        "name": "Mryange",
        "email": "yanxuecheng@selectdb.com",
        "time": "Fri May 29 10:08:17 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 10:08:17 2026 +0800"
      },
      "message": "[fix](be) Fix timestamptz group_array state serde (#63827)\n\nFix collect_list/group_array on nested TIMESTAMPTZ values when complex\naggregate state is serialized through JSON. This keeps the existing\nstate format for compatibility, provides a UTC timezone during serde,\nand adds regression coverage for the nested group_array case."
    },
    {
      "commit": "c27fef6ac08ee7b79946066e0d2c8864689f6c2f",
      "tree": "5b8c58004c2d48f8276ec9fc8d8c4aa35e84c4de",
      "parents": [
        "0bd933ceecf1762c948b5ed51291fa9791ec5c85"
      ],
      "author": {
        "name": "lihangyu",
        "email": "lihangyu@selectdb.com",
        "time": "Fri May 29 09:50:38 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 09:50:38 2026 +0800"
      },
      "message": "[fix](file cache) guard null IOContext in cached remote reader (#63842)\n\n- Guard `CachedRemoteFileReader::read_at_impl` against nullable\n`IOContext`.\n- Pass `NativeReader` `_io_ctx` through header and block reads.\n- Add BE unit coverage for reading through `CachedRemoteFileReader`\nwithout an explicit `IOContext`."
    },
    {
      "commit": "0bd933ceecf1762c948b5ed51291fa9791ec5c85",
      "tree": "bc87755492ac8884376768b0d6119fcf6a174e11",
      "parents": [
        "ce0784d7e32a28dabc114f323e7009b67024eab4"
      ],
      "author": {
        "name": "Gabriel",
        "email": "liwenqiang@selectdb.com",
        "time": "Fri May 29 09:10:21 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 09:10:21 2026 +0800"
      },
      "message": "[fix](be) Keep prefetch reader alive for async tasks (#63796)\n\nProblem Summary: S3/OSS prefetch timeout can cancel and close\nPrefetchBufferedReader while an async PrefetchBuffer task is still\nrunning. The task kept PrefetchBuffer alive but only stored the\nunderlying FileReader as a raw pointer, so the owner could destroy the\nreader before the async task resumed on the error path and logged reader\nmetadata. Keep a shared FileReader reference in each PrefetchBuffer so\nthe async prefetch task cannot outlive the reader it dereferences, and\nadd a unit test that covers close timeout while the prefetch read is\nblocked."
    },
    {
      "commit": "ce0784d7e32a28dabc114f323e7009b67024eab4",
      "tree": "b9bb44cd41d70c6aaedba68cdfd727873e0115d4",
      "parents": [
        "d232caa533054d0234d677f5aa891c5c9ca6de97"
      ],
      "author": {
        "name": "Chenyang Sun",
        "email": "sunchenyang@selectdb.com",
        "time": "Thu May 28 22:03:59 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 28 22:03:59 2026 +0800"
      },
      "message": "[fix](test) Cast variant subcolumn as json in variant_hirachinal for stable output (#63828)\n\ncast(v[\u0027c\u0027] as string) on a heterogeneous variant column can produce\ndifferent whitespace formatting (e.g. \"[1, 2, 3]\" vs \"[1,2,3]\")\ndepending on session variables fuzzed by use_fuzzy_session_variable\n(batch_size, enable_fold_constant_by_be, etc.), causing intermittent\nregression failures. Casting to json normalizes the serialization path\nand is stable across fuzzed execution configs (verified 50/50 runs)."
    },
    {
      "commit": "d232caa533054d0234d677f5aa891c5c9ca6de97",
      "tree": "c8946bae6439fc7e2404d841dbca4ed8dcf0f993",
      "parents": [
        "345f6b978d02f7b828155889836eeed769d3eb1c"
      ],
      "author": {
        "name": "Pxl",
        "email": "xl@selectdb.com",
        "time": "Thu May 28 21:04:33 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 28 21:04:33 2026 +0800"
      },
      "message": "[fix](be) Preserve agg hash shuffle after non-hash exchange (#63766)\n\nRelated PR: #63529, #62438\n\nProblem Summary: `enable_local_exchange_before_agg\u003dfalse` allows\nfirst-phase aggregation to skip the local hash exchange before agg for\nperformance. This is only correct when the input still preserves local\nkey distribution.\n\nAfter #62438, nested loop join and other operators may introduce\nnon-hash local exchanges such as `ADAPTIVE_PASSTHROUGH`. Those exchanges\ncan split rows with the same group/distinct key across local pipeline\ntasks. If agg still skips the hash local exchange, partial aggregation\nstates for the same key are built in different tasks and later\n`COUNT(DISTINCT ...)` can over-count. The reproduced query in\n`output/ddl.txt` returned wrong counts such as `18/20` instead of `10`.\n\nThis PR preserves correctness while keeping the knob usable:\n\n- Aggregation operators now skip local exchange with\n`enable_local_exchange_before_agg\u003dfalse` only when the child preserves\nlocal key distribution.\n- The shared child-distribution check is reused by `AggSinkOperatorX`,\n`StreamingAggOperatorX`, and `DistinctStreamingAggOperatorX`.\n- `Pipeline::need_to_local_exchange()` also handles the case where the\ncurrent pipeline source is a non-hash `LocalExchangeSource` but the\ndownstream target requires hash distribution, so inherited hash-ish\npipeline state cannot incorrectly suppress the required local exchange.\n- Regression coverage is added for the nested-loop-join + distinct\naggregation wrong-result case, and unit tests cover the agg distribution\ndecisions."
    },
    {
      "commit": "345f6b978d02f7b828155889836eeed769d3eb1c",
      "tree": "7e4ab0156c25f77b875e1e9c049f0477e66f4b4e",
      "parents": [
        "a183718e000d0f09b6cb2e12c4dc7c59d5417894"
      ],
      "author": {
        "name": "deardeng",
        "email": "dengxin@selectdb.com",
        "time": "Thu May 28 20:30:42 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 28 20:30:42 2026 +0800"
      },
      "message": "[fix](cloud) CloudUpgradeMgr inspect and abort failed conflict txns while waiting (#60830)\n\nWhen CloudUpgradeMgr waits for unfinished transactions after registering\n\nwatershed txn ids, it now proactively inspects conflict transactions for\n\nthe target db/table set and logs sampled txn details for diagnosis.\n\nIf enable_abort_txn_by_checking_conflict_txn is enabled, the manager\n\ninvokes GlobalTransactionMgr.checkFailedTxns() and aborts failed txns to\n\nreduce the chance of upgrade being blocked by stale/conflicting txns.\n\nAbort failures are handled per txn and do not stop processing the rest.\n\nThis commit also adds tests:\n\n- FE UT CloudUpgradeMgrTest to verify enabled/disabled behavior and\n\n  continue-on-abort-error semantics.\n\n- cloud multi_cluster docker regression case\ntest_unfinished_txn_2pc.groovy\n\n  to reproduce and validate long-running unfinished 2PC txn behavior."
    },
    {
      "commit": "a183718e000d0f09b6cb2e12c4dc7c59d5417894",
      "tree": "f5dbb76bc4f342ad2464834f808f3a624e7faee9",
      "parents": [
        "a95974c6dded2dcf839181d387eb41761b24c889"
      ],
      "author": {
        "name": "924060929",
        "email": "lanhuajian@selectdb.com",
        "time": "Thu May 28 16:43:03 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 28 16:43:03 2026 +0800"
      },
      "message": "[fix](coordinator) fix computeDestIdToInstanceId picking wrong ExchangeNode for multi-input fragments (#63615)\n\n## Proposed changes\n\nFix `Rows mismatched! Data may be lost` error when a fragment receives\ndata from\nmultiple ExchangeNode inputs with different partition types (e.g. NLJ\nwith\nHASH-partitioned probe + BROADCAST build).\n\n### Root cause\n\n`ThriftPlansBuilder.filterInstancesWhichReceiveDataFromRemote` used\n`.iterator().next()` to pick the first input ExchangeNode. The iteration\norder\nover a `Set\u003cEntry\u003e` is non-deterministic. When it happens to pick the\nBROADCAST\ninput (1 destination per BE), `shuffle_idx_to_instance_idx` has only 1\nentry,\nwhile the HASH LOCAL_EXCHANGE expects N entries (one per pipeline task).\nMost\nhash partition indices find no mapping, and BE reports the error.\n\nReproduction: a CTE query with `MultiCastDataSinks` sending\nUNPARTITIONED (to a\nBROADCAST build) and HASH_PARTITIONED (to an INNER JOIN build) into the\nsame\nscan-free fragment. The bug is non-deterministic because it depends on\nSet\niteration order.\n\n### Fix\n\nIterate all input exchanges and select the one with the most\ndestinations on the\ntarget worker. This correctly identifies the main data-carrying\n(HASH-partitioned) exchange, ensuring the map is complete."
    },
    {
      "commit": "a95974c6dded2dcf839181d387eb41761b24c889",
      "tree": "254f4ad62094b03bb14d57fc60dbb779aa0303d8",
      "parents": [
        "7c4dfe9f2892284792c5df61994ea82e3e3add97"
      ],
      "author": {
        "name": "shee",
        "email": "13843187+qzsee@users.noreply.github.com",
        "time": "Thu May 28 15:22:44 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 28 15:22:44 2026 +0800"
      },
      "message": "[BUG](exec) fix coalesce function output null (#63092)\n\n### What problem does this PR solve?\n\nIssue Number: close #xxx\n\nExample: COALESCE(same_department_income_amount, 0) \u003d\u003d\u003e outputs NULL\n(where same_department_income_amount is of type double).\n\nWhen assigning the value to the result column in the computation, the\nassignment is done unconditionally (forced), as in:\n\n```cpp\nresult_raw_data[row] +\u003d\n                    column_raw_data[row] *\n                    typename ColumnType::value_type(!(null_map_data[row] | filled_flag[row]));\n```\nIf the argument column column_raw_data\u0027s null_map[row] is 1, then the\nvalue stored in column_raw_data[row] is garbage data. This garbage may\ncontain values such as NaN. If a preceding argument of COALESCE happens\nto be assigned NaN, then during subsequent assignments we run into cases\nlike:\n\n0 * NaN \u003d NaN\nnum + NaN \u003d NaN\n\nso the assigned result also becomes NaN, which causes value pollution.\n\nBy rights the final output should also be NaN, but what is actually\nreturned is NULL. The reason is that during result serialization/output,\nNaN values are emitted as NULL.\n\n```cpp\ntatus DataTypeNumberSerDe\u003cT\u003e::_write_column_to_mysql(const IColumn\u0026 column,\n                                                      MysqlRowBuffer\u003cis_binary_format\u003e\u0026 result,\n                                                      int row_idx, bool col_const,\n                                                      const FormatOptions\u0026 options) const {\n    //...\n    else if constexpr (std::is_same_v\u003cT, float\u003e) {\n        if (std::isnan(data[col_index])) {\n            // Handle NaN for float, we should push null value\n            buf_ret \u003d result.push_null();\n        } else {\n            buf_ret \u003d result.push_float(data[col_index]);\n        }\n    } \n  //...\n}\n```\n\n\nCo-authored-by: garenshi \u003cgarenshi@tencent.com\u003e"
    },
    {
      "commit": "7c4dfe9f2892284792c5df61994ea82e3e3add97",
      "tree": "6e61c8de5a73ddccc9bbced8ccc9232cc0626bbe",
      "parents": [
        "a0e0ee55cf6e7f02a73e32aa9b71f97cf9bbba9c"
      ],
      "author": {
        "name": "wudi",
        "email": "676366545@qq.com",
        "time": "Thu May 28 15:11:02 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 28 15:11:02 2026 +0800"
      },
      "message": "[improve](streaming-job) support user-specified mysql server_id with per-reader assignment (#63490)\n\n## Summary\n\n- Add an optional `server_id` source property for MySQL CDC streaming\njobs. Accepts a single value (e.g. `5400`) or a range (e.g.\n`5400-5408`). When unset, the value is derived from the jobId hash so\nexisting jobs keep their current server_id when `snapshot_parallelism \u003d\n1`.\n- Fix a latent collision: when `snapshot_parallelism \u003e 1` and\nsource-side DML happens during snapshot, all parallel\n`SnapshotSplitReader` instances previously shared the same server_id and\ntheir backfill BinaryLogClient connections kicked each other out of\nMySQL\u0027s dump-thread slot, dropping binlog events between low and high\nwatermark. Each subtask now gets a distinct server_id from the resolved\nrange; the single binlog reader uses the range start.\n- Cross-field check: reject `server_id` range width smaller than\n`snapshot_parallelism` at job startup with a clear fix-it suggestion."
    },
    {
      "commit": "a0e0ee55cf6e7f02a73e32aa9b71f97cf9bbba9c",
      "tree": "ce27fa3d9c760fdd115c15cb5bf7bbd991246815",
      "parents": [
        "a04dac6b6fccf2732c4ef36959d88eacdeae3845"
      ],
      "author": {
        "name": "wudi",
        "email": "wudi@selectdb.com",
        "time": "Thu May 28 15:06:20 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 28 15:06:20 2026 +0800"
      },
      "message": "[fix](streaming-job) fix postgres historical-date timestamp handling in cdc-client (#63618)\n\n### What problem does this PR solve?\n\nProblem Summary:\n\nWhen a Postgres CDC streaming job ingests rows whose timestamp / date\ncolumns hold historical values (pre-1970 with sub-millisecond precision,\nor pre-1582 / pre-1901 dates), two independent bugs in cdc-client cause\ndata corruption or task crash:\n\n1. `DebeziumJsonDeserializer.convertTimestamp` uses signed `/` and `%`\non negative `micros` / `nanos`, producing a negative `nanoOfMillisecond`\nand tripping Flink `TimestampData`\u0027s `checkArgument(nanoOfMillisecond \u003e\u003d\n0)`. Result: the ingestion task crashes whenever a pre-1970 timestamp\nwith sub-millisecond precision flows through (e.g. `1969-12-31\n23:59:59.999123`).\n\n2. The snapshot path reads column values via `rs.getObject()`, which\nroutes through PG JDBC\u0027s `TimestampUtils` + `GregorianCalendar`. For\npre-1582 timestamps the Julian/proleptic cutover shifts values by N\ndays; for pre-1901 timestamps the JVM time zone\u0027s LMT offset shifts\nvalues by the LMT difference (e.g. ~343s in `Asia/Shanghai`). Result:\nthe same PG value (e.g. `0001-01-01 00:00:00`) yields different doris\nvalues depending on whether the row was synced via snapshot or via\nbinlog.\n\nThis PR fixes both:\n\n1. Use `Math.floorDiv` / `Math.floorMod` so the millisecond / nanosecond\nsplit stays valid for negative epoch values.\n2. Dispatch `TIMESTAMP` / `TIMESTAMPTZ` / `DATE` columns through\n`LocalDateTime` / `OffsetDateTime` / `LocalDate` in the snapshot reader,\nbypassing `GregorianCalendar` entirely. Preserve the legacy\n`Timestamp(Long.MAX/MIN_VALUE)` sentinel for `+/-infinity`."
    },
    {
      "commit": "a04dac6b6fccf2732c4ef36959d88eacdeae3845",
      "tree": "478ef4f88460a153a34612bc998d4e0b2493aaeb",
      "parents": [
        "d1060850bed233b273c638a1d688e340c57b8c03"
      ],
      "author": {
        "name": "lihangyu",
        "email": "lihangyu@selectdb.com",
        "time": "Thu May 28 14:54:03 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 28 14:54:03 2026 +0800"
      },
      "message": "[fix](match) Allow MATCH on aliased variant subcolumns (#63772)\n\nMATCH predicates fail for VARIANT dot subcolumn access\nsuch as `cast(msg.trace_id as string)`, while the equivalent bracket\naccess `msg[\u0027trace_id\u0027]` works. Dot access can leave an `Alias` around\nthe pruned subcolumn slot, and `CheckMatchExpression` rejected the\naliased slot.\n\nFix MATCH predicates on VARIANT dot subcolumn access such as\n`msg.trace_id` so they are accepted like equivalent bracket subcolumn\naccess."
    },
    {
      "commit": "d1060850bed233b273c638a1d688e340c57b8c03",
      "tree": "ff75dcc9c0388f442c06d7e7cef7ac2318574a8a",
      "parents": [
        "84e47b65088d716e2e05a775d7915ab004547bf1"
      ],
      "author": {
        "name": "yujun",
        "email": "yujun@selectdb.com",
        "time": "Thu May 28 14:12:44 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 28 14:12:44 2026 +0800"
      },
      "message": "[fix](show variables) Fix changed variable output in show variables (#63734)\n\nProblem Summary: SHOW VARIABLES WHERE is evaluated through an internal\nschema query. During planning of that internal query, FE may call\nsetVarOnce() and temporarily change session variables such as\ndisable_join_reorder. The schema scan then dumps those temporary values\nand reports them as Changed, even though they are not user-visible\nsession changes. This patch dumps variables from a reverted clone so\none-shot internal values do not affect SHOW VARIABLES output, while\npreserving the existing information_schema WHERE execution behavior."
    },
    {
      "commit": "84e47b65088d716e2e05a775d7915ab004547bf1",
      "tree": "4e7bd6833de0eefd04e8551c1d2c4fa7b2efed8c",
      "parents": [
        "df200d8cbe18642ba558b866ea8f331d0c38481a"
      ],
      "author": {
        "name": "yaoxiao",
        "email": "yx136264032@163.com",
        "time": "Thu May 28 13:47:24 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 28 13:47:24 2026 +0800"
      },
      "message": "[fix](function) Improve numerical robustness of cosine_distance / cosine_similarity (#62840)\n\n### What problem does this PR solve?\n\nTwo defensive hardening fixes in CosineDistance::distance and\nCosineSimilarity::distance to guarantee correct results across the full\nrange of valid float inputs.\n\nFix 1: Use double-precision intermediate when computing the norm\nBefore:\n\nreturn 1 - dot_prod / sqrt(squared_x * squared_y);\nAfter:\n\nconst double norm \u003d std::sqrt(static_cast\u003cdouble\u003e(squared_x) *\nstatic_cast\u003cdouble\u003e(squared_y));\nWhy: squared_x * squared_y is a float multiplication. When squared_x and\nsquared_y are both large (e.g. input elements around 1e19), the product\nexceeds FLT_MAX (~3.4e38) and overflows to +inf. Then sqrt(+inf) \u003d +inf\nand dot_prod / +inf \u003d 0, so two parallel vectors silently get\ncosine_distance \u003d 1.0 (should be 0.0) — a wrong result with no warning.\n\nFor typical L2-normalized embedding vectors this never triggers. But\ncosine_distance accepts arbitrary float* arrays, not just normalized\nembeddings, so the function should be safe for any finite float input.\nThe cost is two static_cast\u003cdouble\u003e ops; double\u0027s range (~1.8e308)\ncannot overflow on any finite float input.\n\nFor non-overflow inputs the result is bit-for-bit equivalent (verified\nby existing tests, which match the same static_cast\u003cfloat\u003e(34.0 /\nstd::sqrt(14.0 * 83.0)) formula).\n\nFix 2: Clamp cosine to [-1, 1]\nAfter:\n\nreturn std::clamp(static_cast\u003cfloat\u003e(dot_prod / norm), -1.0f, 1.0f);\nWhy: Float rounding can make the computed cosine slightly exceed 1.0 for\nidentical (or near-identical) vectors. For example with x \u003d y \u003d (0.1f,\n0.2f, 0.3f), accumulation rounding can yield cosine \u003d 1.0000001, then\n1.0f - cosine \u003d -1e-7 — a negative cosine_distance that violates the\nmetric contract d \u003e\u003d 0 and may break downstream code (DCHECK(distance \u003e\u003d\n0), threshold filters, distance aggregation).\n\nstd::clamp is a one-op guarantee that costs nothing for in-range values.\n\n\nBE UT 编译失败修复（独立于本 PR 余弦修复）\n问题：merge 最新 master 后，BE UT 在 functions_geo_test.cpp:375 编译失败：\n\n\nerror: static assertion failed: assert_cast is redundant for the same\ntype\n\u0027!std::is_same_v\u003cColumnVector\u003cTYPE_BOOLEAN\u003e*,\nColumnVector\u003cTYPE_BOOLEAN\u003e*\u003e\u0027\n根因：master 上三个 PR 叠加副作用：\n\n#63491 把 _null_map 改为强类型 ColumnUInt8::WrappedPtr\n#63059 加 static_assert 拒绝 same-type assert_cast\n#63049（5/26 刚合入）新写的 geo 测试还按老接口加了冗余 cast\n修复：去掉冗余包装\n\n\n-\nassert_cast\u003cColumnUInt8*\u003e(nullable_input-\u003eget_null_map_column_ptr().get())-\u003einsert_value(0);\n+ nullable_input-\u003eget_null_map_column_ptr()-\u003einsert_value(0);\nget_null_map_column_ptr() 现在直接返回\nColumnUInt8::MutablePtr，-\u003einsert_value() 语义不变。\n\n影响：一行改动，仅修复编译报错，不涉及测试语义和余弦相关代码。\n\n\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test \u003c!-- At least one of them must be included. --\u003e\n    - [ ] Regression test\n    - [ ] Unit Test\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [ ] No need to test or manual test. Explain why:\n- [ ] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [ ] Other reason \u003c!-- Add your reason?  --\u003e\n\n- Behavior changed:\n    - [ ] No.\n    - [ ] Yes. \u003c!-- Explain the behavior change --\u003e\n\n- Does this need documentation?\n    - [ ] No.\n- [ ] Yes. \u003c!-- Add document PR link here. eg:\nhttps://github.com/apache/doris-website/pull/1214 --\u003e\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label \u003c!-- Add branch pick label that this PR\nshould merge into --\u003e\n\n---------\n\nCo-authored-by: yaoxiao \u003cyaoxiao@fosun.com\u003e"
    },
    {
      "commit": "df200d8cbe18642ba558b866ea8f331d0c38481a",
      "tree": "1a8943226829878a83eec11d9f5e7787d588881b",
      "parents": [
        "24cb2f77cb248e0fd6c5329136c4a0fce3c52bd4"
      ],
      "author": {
        "name": "yujun",
        "email": "yujun@selectdb.com",
        "time": "Thu May 28 11:38:29 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 28 11:38:29 2026 +0800"
      },
      "message": "[fix](hive table) Fill Hive meta cache when loading row count for queries (#63470)\n\n### What problem does this PR solve?\n\nrelated issue: close #63694\n\nHive external table row count estimation can read Hive Metastore\nmetadata without filling Doris\u0027 Hive external metadata cache. This makes\na normal query pay duplicate HMS metadata access in the same planning\nflow.\n\nThe problematic path is:\n\n1. A normal query asks the external table for row count through\n`ExternalTable.getRowCount()`.\n2. `ExternalRowCountCache` misses and calls\n`HMSExternalTable.fetchRowCount()`.\n3. If HMS table parameters do not contain row count and\n`enable_get_row_count_from_file_list` is enabled, `HMSExternalTable`\nestimates row count from file list.\n4. Before this PR, that estimation path always used\n`getAllPartitionsWithoutCache()` and `getFilesByPartitions(...,\nwithCache\u003dfalse, ...)`.\n5. Later in the same normal query, scan planning still needs partition\nand file metadata, so it reads the same HMS/file metadata again through\nthe normal cached scan path.\n\nThis behavior was originally useful for non-query metadata display\nrequests such as `show table status`, `show stats`, and\n`information_schema.tables`: those requests should not fill heavy Hive\nmetadata caches just because they display cached row count. However,\nnormal query planning is different. If row count estimation has already\nfetched partition and file metadata, filling the metadata cache avoids\nduplicated HMS reads in the following scan planning step.\n\n### How was it fixed?\n\nThis PR separates the two row-count loading modes with an explicit\n`fillMetaCache` flag:\n\n- Normal query row-count loading uses `fillMetaCache\u003dtrue`.\n- Cached row-count display paths such as `getCachedRowCount()` still use\n`fillMetaCache\u003dfalse`.\n- The default async row-count cache loader keeps the existing\nnon-filling behavior unless the caller explicitly requests cache\nfilling.\n- `HMSExternalTable` routes row-count file-list estimation through\ncached or non-cached Hive metadata APIs based on `fillMetaCache`.\n\nConcretely:\n\n- `ExternalTable.getRowCount()` now requests\n`ExternalRowCountCache.getCachedRowCount(..., true)`.\n- `ExternalTable.getCachedRowCount()` and\n`PluginDrivenExternalTable.getCachedRowCount()` request `false`.\n- `ExternalRowCountCache` loads row count through\n`ExternalTable.fetchRowCountWithMetaCache(fillMetaCache)`.\n- `HMSExternalTable.fetchRowCount()` remains the lightweight non-filling\npath.\n- `HMSExternalTable.fetchRowCountWithMetaCache(true)` fills Hive\npartition/file metadata cache while estimating row count from file list.\n\nThis keeps the previous optimization for show/stat display paths while\nallowing normal queries to reuse metadata fetched during row-count\nestimation.\n\n### Check List\n\n- Test:\n  - Unit Test: `ExternalRowCountCacheTest`\n  - Unit Test: `HMSExternalTableTest`\n- Behavior changed: Yes. Normal query row-count estimation for HMS\nexternal tables may now fill Hive metadata cache when it has to estimate\nrow count from file list. Non-query cached row-count display paths still\navoid filling heavy metadata caches.\n- Does this need documentation: No."
    },
    {
      "commit": "24cb2f77cb248e0fd6c5329136c4a0fce3c52bd4",
      "tree": "65d8e2341754f3a0422ea1036f4816e3580b6f9b",
      "parents": [
        "997685763c48873fbec1a89c1731e78b184ea063"
      ],
      "author": {
        "name": "bobhan1",
        "email": "baohan@selectdb.com",
        "time": "Thu May 28 11:01:01 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 28 11:01:01 2026 +0800"
      },
      "message": "[fix](cloud) Skip wait for async rowset warmup (#62764)\n\nProblem Summary: CloudWarmUpManager::warm_up_rowset still waited on a bthread condition variable when sync_wait_timeout_ms was non-positive. Submit those warmup tasks asynchronously and return immediately, while keeping the rowset meta alive for the background task and logging if rowset meta initialization fails.\n\nRelease note: None"
    },
    {
      "commit": "997685763c48873fbec1a89c1731e78b184ea063",
      "tree": "4a570dcf9f8fef22281d778207f6a3e487402dcc",
      "parents": [
        "48d62f4161d49a134757959e8aee7ab49ccd15d4"
      ],
      "author": {
        "name": "Xin Liao",
        "email": "liaoxin@selectdb.com",
        "time": "Thu May 28 10:47:41 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 28 10:47:41 2026 +0800"
      },
      "message": "[improvement](cloud) Enable packed file and empty rowset optimization by default (#63475)\n\nProblem Summary: Cloud mode kept packed file small-file merge and empty\nrowset metadata skipping disabled by default. This change enables\nenable_packed_file and skip_writing_empty_rowset_metadata by default so\nnew cloud deployments merge small files and avoid writing metadata for\nempty rowsets without extra configuration.\n\n### Release note\n\nEnable cloud packed file small-file merge and empty rowset metadata skip\noptimization by default."
    },
    {
      "commit": "48d62f4161d49a134757959e8aee7ab49ccd15d4",
      "tree": "0b414001124694579f22642af5fe151eab3e064b",
      "parents": [
        "67260ed932a096440e33694bd0cc4f51767bece6"
      ],
      "author": {
        "name": "yujun",
        "email": "yujun@selectdb.com",
        "time": "Thu May 28 10:33:22 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 28 10:33:22 2026 +0800"
      },
      "message": "[fix](regression) Wait row count before hot value analyze (#63758)\n\n### What problem does this PR solve?\n\nIssue Number: None\n\nRelated PR: None\n\nProblem Summary:\n\nThe hot value analyze regression test can run sample analyze before\nCloud table row count metadata is reported. In that state sample analyze\ntreats the table as empty and writes empty column statistics, making the\ntest flaky. This PR waits for SHOW DATA to report the inserted row count\nbefore running analyze on non-empty test tables.\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test \u003c!-- At least one of them must be included. --\u003e\n    - [x] Regression test\n    - [ ] Unit Test\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [ ] No need to test or manual test. Explain why:\n- [ ] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [ ] Other reason \u003c!-- Add your reason?  --\u003e\n\n- Behavior changed:\n    - [x] No.\n    - [ ] Yes. \u003c!-- Explain the behavior change --\u003e\n\n- Does this need documentation?\n    - [x] No.\n- [ ] Yes. \u003c!-- Add document PR link here. eg:\nhttps://github.com/apache/doris-website/pull/1214 --\u003e\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label \u003c!-- Add branch pick label that this PR\nshould merge into --\u003e\n\nTests:\n\n```\n./build.sh --fe\n./run-regression-test.sh --run -d statistics -s test_full_analyze_hot_value\ngit diff --check\n```"
    },
    {
      "commit": "67260ed932a096440e33694bd0cc4f51767bece6",
      "tree": "2547ab3f7b0948054e4feffc8743b08b33f75945",
      "parents": [
        "1284b254c0af06802da284df66269417c3152d97"
      ],
      "author": {
        "name": "hui lai",
        "email": "laihui@selectdb.com",
        "time": "Thu May 28 10:23:05 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 28 10:23:05 2026 +0800"
      },
      "message": "[feat](job) add per-job routine load metrics (#63576)\n\n### What problem does this PR solve?\n\nProblem Summary:\n\nBefore this change, FE only exposed aggregate routine load metrics, such\nas total loaded rows, error rows, received bytes, task execution time,\nprogress, lag, and aborted task count across all routine load jobs.\nThese metrics were useful for observing the whole FE, but they could not\nidentify which routine load job contributed to a spike, lag, or abnormal\nerror/task count.\n\nThis change adds per-job routine load metrics. Each metric is exported\nwith `job_id` and `job_name` labels, so users can inspect the status of\na single routine load job from the FE metrics endpoint.\n\nThe new metrics are:\n\n- `doris_fe_routine_load_per_job_total_rows`\n- `doris_fe_routine_load_per_job_error_rows`\n- `doris_fe_routine_load_per_job_received_bytes`\n- `doris_fe_routine_load_per_job_task_execute_time`\n- `doris_fe_routine_load_per_job_task_execute_count`\n- `doris_fe_routine_load_per_job_progress`\n- `doris_fe_routine_load_per_job_lag`\n- `doris_fe_routine_load_per_job_abort_task_num`\n\nExample query from FE metrics endpoint:\n\n```shell\ncurl http://\u003cfe_host\u003e:\u003chttp_port\u003e/metrics | grep routine_load_per_job\n```\n\nExample Prometheus query:\n```\ndoris_fe_routine_load_per_job_lag\n```\n\n\u003cimg width\u003d\"3116\" height\u003d\"1558\" alt\u003d\"image\"\nsrc\u003d\"https://github.com/user-attachments/assets/0fb22c81-3556-44fe-9520-18beabb62859\"\n/\u003e"
    },
    {
      "commit": "1284b254c0af06802da284df66269417c3152d97",
      "tree": "5831dbca532880b80b59f01ed09921655d0ceabb",
      "parents": [
        "5e9126cf5b2a162c2e132589a5fc4b53a605e04b"
      ],
      "author": {
        "name": "feiniaofeiafei",
        "email": "moailing@selectdb.com",
        "time": "Thu May 28 10:04:37 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 28 10:04:37 2026 +0800"
      },
      "message": "[fix](test) delete unstable case in agg_strategy (#63726)"
    },
    {
      "commit": "5e9126cf5b2a162c2e132589a5fc4b53a605e04b",
      "tree": "08cc361c6f94149569c38da526d5b5f5df9aba41",
      "parents": [
        "2e72603618ca5ce998184bcc95285bf30405f5e0"
      ],
      "author": {
        "name": "Socrates",
        "email": "suyiteng@selectdb.com",
        "time": "Thu May 28 09:34:41 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 28 09:34:41 2026 +0800"
      },
      "message": "[fix](fe) Normalize default HDFS paths in LocationPath (#63476)\n\nIceberg tables written through Hadoop catalog can store data file paths\nwithout a URI scheme, for example\n`/hadoop_catalog/db/tbl/data/file.parquet`. Doris should normalize these\npaths with the catalog `fs.defaultFS` before creating scan ranges.\n\nThe Iceberg `LocationPath` cache path kept the original blank schema\nafter normalization and did not derive the schema from the normalized\nURI in the cached fallback path. As a result, partitioned table planning\ncould fail with `Invalid location, missing authority`, and\nnon-partitioned scans could pass an invalid file type or fs name to BE.\n\nThis patch derives the schema from the normalized URI when the original\npath has no scheme and keeps cached `LocationPath` creation consistent\nwith full parsing."
    },
    {
      "commit": "2e72603618ca5ce998184bcc95285bf30405f5e0",
      "tree": "d1b21b7d8f4bab3dbef895504c76020c2dbd2cf1",
      "parents": [
        "c24d454f15cee2d937ef4749270a3ecb449eafe6"
      ],
      "author": {
        "name": "Chenyang Sun",
        "email": "sunchenyang@selectdb.com",
        "time": "Wed May 27 20:51:39 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed May 27 20:51:39 2026 +0800"
      },
      "message": "[fix](variant) preserve TIMESTAMPTZ values in sparse path (#63522)\n\nAdd the missing write_one_cell_to_binary override mirroring\nDataTypeDateTimeV2SerDe so the writer also emits the scale byte. Reader\nis already correct.\n\n\n### What problem does this PR solve?\n\nIssue Number: close #xxx\n\nRelated PR: #xxx\n\nProblem Summary:\n\n### Release note\n\nNone\n\n### Check List (For Author)\n\n- Test \u003c!-- At least one of them must be included. --\u003e\n    - [x] Regression test\n    - [x] Unit Test\n    - [ ] Manual test (add detailed scripts or steps below)\n    - [ ] No need to test or manual test. Explain why:\n- [ ] This is a refactor/code format and no logic has been changed.\n        - [ ] Previous test can cover this change.\n        - [ ] No code files have been changed.\n        - [ ] Other reason \u003c!-- Add your reason?  --\u003e\n\n- Behavior changed:\n    - [ ] No.\n    - [ ] Yes. \u003c!-- Explain the behavior change --\u003e\n\n- Does this need documentation?\n    - [ ] No.\n- [ ] Yes. \u003c!-- Add document PR link here. eg:\nhttps://github.com/apache/doris-website/pull/1214 --\u003e\n\n### Check List (For Reviewer who merge this PR)\n\n- [ ] Confirm the release note\n- [ ] Confirm test cases\n- [ ] Confirm document\n- [ ] Add branch pick label \u003c!-- Add branch pick label that this PR\nshould merge into --\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "c24d454f15cee2d937ef4749270a3ecb449eafe6",
      "tree": "583f67ef2e36ddab5f2883692dd257ec75f2df90",
      "parents": [
        "05690fce06f6daa119ca2b77c014865215cdc540"
      ],
      "author": {
        "name": "Jack",
        "email": "jiangkai@selectdb.com",
        "time": "Wed May 27 20:29:42 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed May 27 20:29:42 2026 +0800"
      },
      "message": "[fix](search) reject Lucene-syntax SEARCH on columns without inverted index (#63637)\n\n## Proposed changes\n\nIssue Number: close #N/A \n\n### What problem does this PR solve?\n\nSEARCH (Lucene syntax) predicates against columns that have no inverted\nindex silently fall back to an empty bitmap on BE (`vsearch.cpp` and\n`function_search.cpp` only log a WARNING then `return Status::OK()` with\nan empty result), making the query look like *no rows matched*. That is\nindistinguishable from a successful query that simply found nothing and\nmisleads users.\n\nThis PR adds a planning-time check in `RewriteSearchToSlots`, matching\nthe existing \"column does not exist\" behavior — fail fast with a clear\n`AnalysisException` instead of letting BE silently return FALSE.\n\n- **Normal columns**: require `OlapTable.getInvertedIndex(column, null)\n!\u003d null`.\n- **Variant subcolumns** (`parent.path`): require any `INVERTED` index\nwhose first column equals the parent variant column; the concrete\nsubcolumn binding is still resolved per-segment in BE, consistent with\nthe `is_variant_sub` branch in `function_search.cpp`.\n\nAlso hardens `OlapTable.getInvertedIndex` against NPE when the table has\nno `TableIndexes` set (returns `null` instead of dereferencing).\n\nError message example:\n\n```\nField \u0027msg_body\u0027 has no inverted index, cannot be used in search: msg_body:error.\nCreate an inverted index on the column first (ALTER TABLE ... ADD INDEX ... USING INVERTED).\n```"
    }
  ],
  "next": "05690fce06f6daa119ca2b77c014865215cdc540"
}
