)]}'
{
  "log": [
    {
      "commit": "cc9bb8e165be69c05e615e008d1b9e1a4d26b187",
      "tree": "d7008f2f6b00659b8bc96fb50d8c376a95255ebf",
      "parents": [
        "d13301ccd0ad841e1fb4315e53a20b81f4ac2cec"
      ],
      "author": {
        "name": "XiaoHongbo",
        "email": "xiaohongbo.xhb@alibaba-inc.com",
        "time": "Fri Jun 05 17:39:20 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jun 05 17:39:20 2026 +0800"
      },
      "message": "[python] Fix manifest read failure when _WRITE_COLS contains system fields (#8131)\n\n### Purpose\nWhen reading a table whose data files have `_WRITE_COLS` containing\nsystem fields (e.g. `_ROW_ID`, `_SEQUENCE_NUMBER`), the read\n  fails with:\n  KeyError: \u0027_ROW_ID\u0027\n\nAligns with the Java-side fix in #7797 — skip metadata fields that are\nnot in the table schema when resolving value stats fields from\n`_WRITE_COLS`.\n\n  ## Test\n\n  - `test_read_write_cols_with_system_field`"
    },
    {
      "commit": "d13301ccd0ad841e1fb4315e53a20b81f4ac2cec",
      "tree": "6df0adfcb5ad7d90474482ca047a86da4587e628",
      "parents": [
        "3b639af55da0d9bbf0bc5e9df1e85253b1a2c9b8"
      ],
      "author": {
        "name": "Stefanietry",
        "email": "zhou1172026225@gmail.com",
        "time": "Fri Jun 05 15:52:09 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jun 05 15:52:09 2026 +0800"
      },
      "message": "[spark] support distributed execution of vector search on spark (#8108)\n\nPurpose: Currently, vector search operation is executed on a single node\nwithin the driver, which may lead to performance bottlenecks when\ndealing with large amounts of data. This issue aims to implement a\ndistributed execution capability."
    },
    {
      "commit": "3b639af55da0d9bbf0bc5e9df1e85253b1a2c9b8",
      "tree": "8c3f1f643b7591bef7cfee82b640ad7415e42a64",
      "parents": [
        "08ce6b26be8366eb0172643379a15a2c6ff6cf27"
      ],
      "author": {
        "name": "Stefanietry",
        "email": "zhou1172026225@gmail.com",
        "time": "Fri Jun 05 11:42:40 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jun 05 11:42:40 2026 +0800"
      },
      "message": "[spark] support persist source data to avoid loading data repeatedly (#8081)"
    },
    {
      "commit": "08ce6b26be8366eb0172643379a15a2c6ff6cf27",
      "tree": "15746100588212344f57b7b3f9ff10231669ea58",
      "parents": [
        "66c2b9caeabdb5c5b9c54bbcd5a4d75984d17981"
      ],
      "author": {
        "name": "junmuz",
        "email": "4795269+junmuz@users.noreply.github.com",
        "time": "Fri Jun 05 03:20:03 2026 +0100"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jun 05 10:20:03 2026 +0800"
      },
      "message": "Add lastCommittedSnapshotId commit metric and document missing metrics (#7589)"
    },
    {
      "commit": "66c2b9caeabdb5c5b9c54bbcd5a4d75984d17981",
      "tree": "4336e48435968e2a2cb483898af64cb03c5fe9f6",
      "parents": [
        "3eb5a4da5f631f4dff661dcae500f32efbd4309f"
      ],
      "author": {
        "name": "Jiajia Li",
        "email": "plusplusjiajia@alibaba-inc.com",
        "time": "Thu Jun 04 16:51:01 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 16:51:01 2026 +0800"
      },
      "message": "[arrow] Fix TIMESTAMP_LTZ Arrow timezone to use UTC instead of system default (#7364)\n\nLocalZonedTimestampType stores UTC timestamps by definition. However,\nArrowFieldTypeConversion used ZoneId.systemDefault() as the\nArrowTimestamp timezone."
    },
    {
      "commit": "3eb5a4da5f631f4dff661dcae500f32efbd4309f",
      "tree": "cab2af6fc6fea3cca21945f844365fccef0c7219",
      "parents": [
        "68cf3bca10f2891cee2c1c9769669e0d1cd3765a"
      ],
      "author": {
        "name": "QuakeWang",
        "email": "45645138+QuakeWang@users.noreply.github.com",
        "time": "Thu Jun 04 16:36:30 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 16:36:30 2026 +0800"
      },
      "message": "[python][ray] Preserve schema for empty reads (#8118)\n\nThe top-level Ray `read_paimon` API planned reads through\n`RayDatasource`. When a table scan produced no splits,\n`RayDatasource.get_read_tasks()` returned no read tasks, so Ray could\ncreate an empty dataset without the Paimon table schema.\n\nThis was inconsistent with `TableRead.to_ray()`, which already returns\nan empty Arrow-backed Ray dataset with the planned read schema.\n\nThis PR makes `read_paimon` use the planned `read_type` to build an\nempty Arrow table when there are no splits, so empty reads preserve\nschema and projection. It also lazily imports `ray.data` and reports an\nactionable `pypaimon[ray]` install hint when Ray is missing."
    },
    {
      "commit": "68cf3bca10f2891cee2c1c9769669e0d1cd3765a",
      "tree": "e58a067e010a3de559bb9bd6751911ee8b457332",
      "parents": [
        "3721ae0f88c4b739566215f0af7db96238d1a620"
      ],
      "author": {
        "name": "Junrui Lee",
        "email": "jrlee.ljr@gmail.com",
        "time": "Thu Jun 04 16:29:32 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 16:29:32 2026 +0800"
      },
      "message": "[core] Support snapshot-based sequence ordering for primary-key tables (#7832)"
    },
    {
      "commit": "3721ae0f88c4b739566215f0af7db96238d1a620",
      "tree": "076d40f15d153399780da80957bfcfbb52600c08",
      "parents": [
        "a1d255bf8da9d1d58da0c4748ca6149107f67f4c"
      ],
      "author": {
        "name": "XiaoHongbo",
        "email": "xiaohongbo.xhb@alibaba-inc.com",
        "time": "Thu Jun 04 14:30:11 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 14:30:11 2026 +0800"
      },
      "message": "[python] Fix tantivy full-text index schema mismatch  (#8113)"
    },
    {
      "commit": "a1d255bf8da9d1d58da0c4748ca6149107f67f4c",
      "tree": "475cf0585302552724ab5b4904f2efc27f7d4b6b",
      "parents": [
        "2b4f24ff39432da46e28884b423a74d29e8cf7b5"
      ],
      "author": {
        "name": "Junrui Lee",
        "email": "jrlee.ljr@gmail.com",
        "time": "Thu Jun 04 14:19:36 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 14:19:36 2026 +0800"
      },
      "message": "[python] Add StartupMode enum and scan.mode option to CoreOptions (#7900)"
    },
    {
      "commit": "2b4f24ff39432da46e28884b423a74d29e8cf7b5",
      "tree": "7268a7af133dc94262dffbf4a50cd9a4755de1b9",
      "parents": [
        "ba4d76da89cfd4f2015479a3c5436901d9a281de"
      ],
      "author": {
        "name": "Arnav Balyan",
        "email": "60175178+ArnavBalyan@users.noreply.github.com",
        "time": "Thu Jun 04 11:26:16 2026 +0530"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 13:56:16 2026 +0800"
      },
      "message": "[hive] Fix insert into static partitions on managed Paimon tables (#7824)"
    },
    {
      "commit": "ba4d76da89cfd4f2015479a3c5436901d9a281de",
      "tree": "7ab0d21e8ab8d1a8fa578fc6dd8a358f16032ba4",
      "parents": [
        "4e54917b06cd603d9e6f3cbb79fec0694b3ae69e"
      ],
      "author": {
        "name": "umi",
        "email": "55790489+discivigour@users.noreply.github.com",
        "time": "Thu Jun 04 13:54:46 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 13:54:46 2026 +0800"
      },
      "message": "[core] Support manifest sort feature when commit (#7842)"
    },
    {
      "commit": "4e54917b06cd603d9e6f3cbb79fec0694b3ae69e",
      "tree": "c5e8020ef081aec9b1819759e865b02cf72a32b4",
      "parents": [
        "40eadc2f0133d2859f85cc6b1e0de35433234846"
      ],
      "author": {
        "name": "duanyyyyyyy",
        "email": "139062392+duanyyyyyyy@users.noreply.github.com",
        "time": "Thu Jun 04 13:53:43 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 13:53:43 2026 +0800"
      },
      "message": "[core] Fix DataEvolutionFileStoreScan schema-evolution filtering (#8084)"
    },
    {
      "commit": "40eadc2f0133d2859f85cc6b1e0de35433234846",
      "tree": "e82415314c228b9212cf2dbc00c8322845511d7c",
      "parents": [
        "d78babb99fe7d25a72b002624ee0570a5a2c47c9"
      ],
      "author": {
        "name": "umi",
        "email": "55790489+discivigour@users.noreply.github.com",
        "time": "Thu Jun 04 13:52:19 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 13:52:19 2026 +0800"
      },
      "message": "[python] Support BlobView feature (#8021)\n\n- Add Python BlobViewStruct / BlobView wire-format support and the\nblob-view-field option.\n- Store descriptor/view BLOB fields inline, validate bad inline field\nconfiguration and payloads, and avoid writing new .blob files for view\nfields.\n- Resolve blob-view fields during reads through catalog-aware lookup,\nreturning bytes by default or upstream BlobDescriptor bytes when\nblob-as-descriptor\u003dtrue."
    },
    {
      "commit": "d78babb99fe7d25a72b002624ee0570a5a2c47c9",
      "tree": "dc030a495d5ebfba4cc54f779429f7c40a4d4bf5",
      "parents": [
        "1d56d2d235b2f63882daf5b6c8c22a29c9ee40a0"
      ],
      "author": {
        "name": "XiaoHongbo",
        "email": "xiaohongbo.xhb@alibaba-inc.com",
        "time": "Thu Jun 04 13:51:13 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 13:51:13 2026 +0800"
      },
      "message": "[python][ray] Support partial SET and INSERT in merge_into (#8085)"
    },
    {
      "commit": "1d56d2d235b2f63882daf5b6c8c22a29c9ee40a0",
      "tree": "f36040c60d02f29d17f4c727232fbaab5605f706",
      "parents": [
        "fbab26b1090192d4e467855e90b85ffc576da03a"
      ],
      "author": {
        "name": "QuakeWang",
        "email": "45645138+QuakeWang@users.noreply.github.com",
        "time": "Thu Jun 04 13:51:04 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 13:51:04 2026 +0800"
      },
      "message": "[core] Fix flaky duplicate file discard test (#8106)"
    },
    {
      "commit": "fbab26b1090192d4e467855e90b85ffc576da03a",
      "tree": "3728ca5d659dd805a84f748e7104da1d1777f7a1",
      "parents": [
        "0824fe77b16cc50a2988f1322b4a8a33a3691874"
      ],
      "author": {
        "name": "chaoyang",
        "email": "chaoyang@apache.org",
        "time": "Thu Jun 04 13:50:51 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 13:50:51 2026 +0800"
      },
      "message": "[python] In-memory merge buffer for primary-key writer (#7759)"
    },
    {
      "commit": "0824fe77b16cc50a2988f1322b4a8a33a3691874",
      "tree": "b8bf095d301220db121d76ac7e6dca95afa9c630",
      "parents": [
        "0eb9011fc824d39b9827f994a561c52707f62161"
      ],
      "author": {
        "name": "Kerwin Zhang",
        "email": "xiyu.zk@alibaba-inc.com",
        "time": "Thu Jun 04 13:44:30 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 13:44:30 2026 +0800"
      },
      "message": "[spark] Support V2 DML for row-tracking append-only tables (#8094)"
    },
    {
      "commit": "0eb9011fc824d39b9827f994a561c52707f62161",
      "tree": "102a0f0d36cd6a7655fcb8d851d48746ff875f45",
      "parents": [
        "65846f8ee6bfcb5eaeb0653469ed768a841af879"
      ],
      "author": {
        "name": "Colin",
        "email": "hansichan.crypto@gmail.com",
        "time": "Thu Jun 04 13:33:07 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 13:33:07 2026 +0800"
      },
      "message": "[python] Support JDBC catalog (#7720)\n\nSupport JDBC catalog in PyPaimon. This adds a Python JDBC catalog\nimplementation that uses the same catalog metadata tables as Java Paimon\nJDBC catalog: `paimon_tables`, `paimon_database_properties`, and\n`paimon_table_properties`.\n\nThe implementation supports SQLite with the Python standard library and\ndynamically supports MySQL/PostgreSQL when a corresponding Python DB-API\ndriver is installed. Table data and schema files continue to use\nexisting PyPaimon `FileIO` and `SchemaManager` behavior."
    },
    {
      "commit": "65846f8ee6bfcb5eaeb0653469ed768a841af879",
      "tree": "07c3333d6624f08ebaf80493f496f182a7c75cfd",
      "parents": [
        "b5cbf23d0398693d5efbc204e58aac80bb16edc8"
      ],
      "author": {
        "name": "Rhett CfZhuang",
        "email": "dark.momo985@gmail.com",
        "time": "Thu Jun 04 12:18:54 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 12:18:54 2026 +0800"
      },
      "message": "[core] Add validation to prevent primary key in sequence-group (#7052) (#7656)\n\nWhen a primary key field is configured in a sequence-group of\npartial-update merge engine, it causes Parquet decoding failures during\ncompaction because the key field may be set to null. This commit adds\nearly validation at configuration parsing time to reject such invalid\nconfigurations with a clear error message."
    },
    {
      "commit": "b5cbf23d0398693d5efbc204e58aac80bb16edc8",
      "tree": "7a3a34220d7a59084b09d8e414ed619e1ab878cc",
      "parents": [
        "4d8000bc8537cb2f07d1bc5cd1897ca84a27b5a3"
      ],
      "author": {
        "name": "Silas",
        "email": "yhlunar@qq.com",
        "time": "Thu Jun 04 12:15:13 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 12:15:13 2026 +0800"
      },
      "message": "[spark] SparkFilterConverter: support AlwaysTrue/False, fix silent NaN drop (#8060)"
    },
    {
      "commit": "4d8000bc8537cb2f07d1bc5cd1897ca84a27b5a3",
      "tree": "d2f0c79d3472267b3933e2e9a66146b003b953bd",
      "parents": [
        "8186094f9b14fcf69317972b6dbfee31d30e6d3f"
      ],
      "author": {
        "name": "Kerwin Zhang",
        "email": "xiyu.zk@alibaba-inc.com",
        "time": "Thu Jun 04 12:14:43 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 12:14:43 2026 +0800"
      },
      "message": "[spark] Harden dynamic overwrite against optimized child plans (#8052)\n\n`PaimonDynamicPartitionOverwriteCommand` exposes its child query to\nSpark optimizer through `V2WriteCommand`, but later wraps the same query\nback into a Dataset in `run()` before passing it to\n`WriteIntoPaimonTable`.This is fragile when the child query has already\nbeen optimized by Spark. The optimized plan may contain\noptimizer/planner-side placeholders, such as `DynamicPruningSubquery`,\nwhich are not ideal to expose again to writer-side Dataset operations.\n\nThis PR makes the command-to-writer boundary more robust for the dynamic\npartition overwrite fallback path. Before passing the query to\n`WriteIntoPaimonTable`, it converts the child query into an RDD-backed\nDataFrame via `createNewDataFrame(createDataset(...))`. As a result, the\nwriter consumes a clean logical plan instead of directly consuming the\npossibly optimized child plan."
    },
    {
      "commit": "8186094f9b14fcf69317972b6dbfee31d30e6d3f",
      "tree": "9c3d87fe91edbc79161ec96d257c05944a3898aa",
      "parents": [
        "4e3c4b82dc4fc42c7b9ab71a94ff1f8d308f67be"
      ],
      "author": {
        "name": "QuakeWang",
        "email": "45645138+QuakeWang@users.noreply.github.com",
        "time": "Thu Jun 04 12:13:45 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 12:13:45 2026 +0800"
      },
      "message": "[python][ray] Pin merge source table snapshot (#8110)\n\nRay merge-into already pins target reads to the base snapshot, but\nPaimon source tables were still normalized through `read_paimon` without\nan explicit snapshot. Because Ray Dataset execution is lazy, source\nplanning could otherwise observe a later table snapshot than the one\nseen during merge preparation.\n\nThis PR captures the latest snapshot id for string source tables during\n`_prepare` and passes it to `read_paimon`, so the source side uses a\nstable snapshot throughout merge planning and execution."
    },
    {
      "commit": "4e3c4b82dc4fc42c7b9ab71a94ff1f8d308f67be",
      "tree": "bd766b4cd98306a164877557a65cad8deaf4f560",
      "parents": [
        "4a5462d47ed8cb7c0f2810cd91126b473488af60"
      ],
      "author": {
        "name": "zhoulii",
        "email": "zhouli.dev@foxmail.com",
        "time": "Thu Jun 04 12:13:23 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 12:13:23 2026 +0800"
      },
      "message": "[python] Align ExternalPathProvider with Java multi-strategy support. (#8104)"
    },
    {
      "commit": "4a5462d47ed8cb7c0f2810cd91126b473488af60",
      "tree": "a0cdbbffb6d48eb6a2df994d6b377eb117bf72c2",
      "parents": [
        "6e14824885462f0fdb9db33361e67aae51d30fa6"
      ],
      "author": {
        "name": "YeJunHao",
        "email": "41894543+leaves12138@users.noreply.github.com",
        "time": "Thu Jun 04 11:01:50 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 11:01:50 2026 +0800"
      },
      "message": "[core] Filter side files in BTree global index scans (#8109)\n\nBTree global index scan planning should avoid unnecessary dedicated side\nfiles such as blob and vector-store files. However, pruning by\n`readType` is too broad for data-evolution tables: old normal data files\nmay not contain a newly added indexed column, but they still need to be\nscanned and indexed with a NULL key."
    },
    {
      "commit": "6e14824885462f0fdb9db33361e67aae51d30fa6",
      "tree": "1d578bc1f387de5414ceac3bf3d3cce501de8ce4",
      "parents": [
        "4d0a6515670767c6db22823201e33d33a19b7b56"
      ],
      "author": {
        "name": "QuakeWang",
        "email": "45645138+QuakeWang@users.noreply.github.com",
        "time": "Wed Jun 03 21:45:56 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 21:45:56 2026 +0800"
      },
      "message": "[flink][cdc] Fix schema event position in source reader (#8099)"
    },
    {
      "commit": "4d0a6515670767c6db22823201e33d33a19b7b56",
      "tree": "dcae154218a70fadfdcad1d660457faeaf2f90a5",
      "parents": [
        "c864884bc92bab09e2b39bddaeab5e61483ba7ad"
      ],
      "author": {
        "name": "huangxiaoping",
        "email": "1754789345@qq.com",
        "time": "Wed Jun 03 21:45:37 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 21:45:37 2026 +0800"
      },
      "message": "[hive] Avoid treating empty partitioned tables as unpartitioned during migration (#8100)"
    },
    {
      "commit": "c864884bc92bab09e2b39bddaeab5e61483ba7ad",
      "tree": "d35615a3e47e12dac53749f698a1027d08e7ba5c",
      "parents": [
        "4a71298bcbc4bec42ba404d114161b8440ea2a95"
      ],
      "author": {
        "name": "umi",
        "email": "55790489+discivigour@users.noreply.github.com",
        "time": "Wed Jun 03 21:17:20 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 21:17:20 2026 +0800"
      },
      "message": "[python] Push limit down to the reader layer for append table (#8102)"
    },
    {
      "commit": "4a71298bcbc4bec42ba404d114161b8440ea2a95",
      "tree": "8ef492ab4174b7073ccce45d455dbeb51884ae90",
      "parents": [
        "e4d0573aed02e341bb8fc6411a5280d7ed4db2b5"
      ],
      "author": {
        "name": "Faiz",
        "email": "wxy407679@antgroup.com",
        "time": "Wed Jun 03 19:42:35 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 19:42:35 2026 +0800"
      },
      "message": "[python] introduce BlobConsumer mirroring Java module (#8105)"
    },
    {
      "commit": "e4d0573aed02e341bb8fc6411a5280d7ed4db2b5",
      "tree": "95c1112a217f9490c5264a85ab919a0ac37a1e5f",
      "parents": [
        "5952c0105a7743c04910aa52ba736959bc2e957b"
      ],
      "author": {
        "name": "XiaoHongbo",
        "email": "xiaohongbo.xhb@alibaba-inc.com",
        "time": "Wed Jun 03 19:22:19 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 19:22:19 2026 +0800"
      },
      "message": "[python][ray] Ray merge into support condition (#8076)"
    },
    {
      "commit": "5952c0105a7743c04910aa52ba736959bc2e957b",
      "tree": "8964c5ff1e9e9594fddac5d349a3a9cd3d2568cb",
      "parents": [
        "9a31504aaf8e4ee293aee80c65a403bf586a10ca"
      ],
      "author": {
        "name": "YeJunHao",
        "email": "41894543+leaves12138@users.noreply.github.com",
        "time": "Wed Jun 03 16:52:48 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 16:52:48 2026 +0800"
      },
      "message": "[core] Require database for blob view serialization (#8095)"
    },
    {
      "commit": "9a31504aaf8e4ee293aee80c65a403bf586a10ca",
      "tree": "3bfd8c0fb8f700235bc309f0de96b86f17d0ffdf",
      "parents": [
        "2093598e1f532bc905b485bd73478a92265325a9"
      ],
      "author": {
        "name": "Jingsong Lee",
        "email": "jingsonglee0@gmail.com",
        "time": "Wed Jun 03 16:45:51 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 16:45:51 2026 +0800"
      },
      "message": "[python] Integrate paimon-mosaic format into PyPaimon (#8098)"
    },
    {
      "commit": "2093598e1f532bc905b485bd73478a92265325a9",
      "tree": "4d7fe25c8e18377db3fda036aa9569637d61b39a",
      "parents": [
        "dd3e67e85a6c80a65a4143f1ca2eaaa1123058b3"
      ],
      "author": {
        "name": "XiaoHongbo",
        "email": "xiaohongbo.xhb@alibaba-inc.com",
        "time": "Wed Jun 03 16:39:18 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 16:39:18 2026 +0800"
      },
      "message": "[python] Fix upsert row_id validation failure on tables with row_id holes (#8092)"
    },
    {
      "commit": "dd3e67e85a6c80a65a4143f1ca2eaaa1123058b3",
      "tree": "d6fce0850c5df367431e71c32c053078a103746f",
      "parents": [
        "a993a212d579759624a08cb294b05ed5d99f7c05"
      ],
      "author": {
        "name": "Stefanietry",
        "email": "zhou1172026225@gmail.com",
        "time": "Wed Jun 03 16:24:29 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 16:24:29 2026 +0800"
      },
      "message": "[spark] fix build paimon scan of process vector search for spark3.2 (#8089)"
    },
    {
      "commit": "a993a212d579759624a08cb294b05ed5d99f7c05",
      "tree": "1e99bb9462aff2fed9d7b2c8eda2b754fdfe5df3",
      "parents": [
        "0f837d1599d867eafc021d8e4a55b0084866e59b"
      ],
      "author": {
        "name": "jerry",
        "email": "jinglining0@gmail.com",
        "time": "Wed Jun 03 15:07:39 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 15:07:39 2026 +0800"
      },
      "message": "[fix] fix lumina package repo (#8097)"
    },
    {
      "commit": "0f837d1599d867eafc021d8e4a55b0084866e59b",
      "tree": "b7ed3ceec9ce6a6a2cc838669cba609e6b86b44e",
      "parents": [
        "a8e333c4676f2d726dc749cf8dadd2c8b954a3ac"
      ],
      "author": {
        "name": "QuakeWang",
        "email": "45645138+QuakeWang@users.noreply.github.com",
        "time": "Wed Jun 03 14:12:01 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 14:12:01 2026 +0800"
      },
      "message": "[flink][cdc] Keep table-aware split type during checkpoint (#8093)\n\n`TableAwareFileStoreSourceSplit` extends `FileStoreSourceSplit`, but it\ninherited `updateWithRecordsToSkip` from the parent class. During reader\ncheckpointing, `FileStoreSourceSplitState.toSourceSplit()` calls that\nmethod and returned a plain `FileStoreSourceSplit`, dropping the CDC\ntable metadata and causing CDC reader state restoration to cast the\nsplit back to `TableAwareFileStoreSourceSplit`.\n\nThis PR overrides `updateWithRecordsToSkip` in\n`TableAwareFileStoreSourceSplit` so checkpointed active splits keep\ntheir table-aware type and preserve `identifier`, `lastSchemaId`, and\n`schemaId`."
    },
    {
      "commit": "a8e333c4676f2d726dc749cf8dadd2c8b954a3ac",
      "tree": "dd8b66ffd5fd58caa40cfb53f231b0f696c9b3b5",
      "parents": [
        "c77a58e5a0856cc795d877c06977ef5d9e6e9142"
      ],
      "author": {
        "name": "Jingsong Lee",
        "email": "jingsonglee0@gmail.com",
        "time": "Wed Jun 03 13:09:24 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 13:09:24 2026 +0800"
      },
      "message": "[format] Add paimon-mosaic module with reader and writer (#7917)\n\nSee https://paimon.apache.org/docs/mosaic/\n\nIntroduces the Mosaic file format integration for Paimon with:\n- MosaicRecordsReader: row-group level predicate filtering using\nstatistics, column projection, and correct returnedPosition tracking\n- MosaicRecordsWriter: BundleFormatWriter with writerMetadata() support\nfor in-memory stats capture (avoids re-reading files on object stores)\n- MosaicSimpleStatsExtractor: stats extraction from file or\nwriterMetadata, with SimpleColStatsCollector integration\n- MosaicObjects: byte[] to Paimon object conversion for all supported\ntypes\n- Comprehensive test suite (6 test classes covering unit and integration\ntests)"
    },
    {
      "commit": "c77a58e5a0856cc795d877c06977ef5d9e6e9142",
      "tree": "f80b2ec6de35ab0b62b4595f7ad682a20d500621",
      "parents": [
        "2476815121ca1c1ffc260d5cb1c5febccdb913a9"
      ],
      "author": {
        "name": "Jingsong Lee",
        "email": "jingsonglee0@gmail.com",
        "time": "Wed Jun 03 12:12:58 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 12:12:58 2026 +0800"
      },
      "message": "[flink] Fix flaky ConsumerActionITCase (#8086)"
    },
    {
      "commit": "2476815121ca1c1ffc260d5cb1c5febccdb913a9",
      "tree": "6b1d559b41ffec4b5e9a9756a80627da0db400c6",
      "parents": [
        "b839753f46d8d447ab4ba5f8770e1ca13c0a5288"
      ],
      "author": {
        "name": "Jingsong Lee",
        "email": "jingsonglee0@gmail.com",
        "time": "Wed Jun 03 12:12:44 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 12:12:44 2026 +0800"
      },
      "message": "[tantivy] Support configurable full-text analyzers (#8074)\n\nThis PR expands Tantivy full-text global index tokenizer support into a\nconfigurable analyzer pipeline. It keeps the existing ngram and Jieba\nsupport, adds common LanceDB-style tokenizer/filter options, and wires\nthe same metadata through Java, Rust JNI, and PyPaimon readers."
    },
    {
      "commit": "b839753f46d8d447ab4ba5f8770e1ca13c0a5288",
      "tree": "8f8eae7b902c2a4e57efd5b7e530ae8c274237eb",
      "parents": [
        "38051b40c97963a0b5950bbb72c84c11eadeecb1"
      ],
      "author": {
        "name": "chaoyang",
        "email": "chaoyang@apache.org",
        "time": "Wed Jun 03 10:28:14 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 10:28:14 2026 +0800"
      },
      "message": "[python] Add HDFS native FileIO backend (no Hadoop install required) (#8031)\n\nIntroduces HdfsNativeFileIO backed by the hdfs-native protocol client\n(Rust + PyO3)\n\nDefault backend for hdfs:// and viewfs:// switches to native; the\nPyArrow / libhdfs path is kept, with auto-fallback when hdfs-native is\nunavailable (e.g. on Windows or when the extra is not installed)."
    },
    {
      "commit": "38051b40c97963a0b5950bbb72c84c11eadeecb1",
      "tree": "a87613349ccf75ead4b88680d6b81599452a18ac",
      "parents": [
        "5d4433360793ca48f34e90864983ef7c52d94bc9"
      ],
      "author": {
        "name": "QuakeWang",
        "email": "45645138+QuakeWang@users.noreply.github.com",
        "time": "Wed Jun 03 08:58:10 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 03 08:58:10 2026 +0800"
      },
      "message": "[python][ray] Honor partition overwrite in write_ray (#8088)\n\n`TableWrite.write_ray()` previously did not carry builder-level\noverwrite partitions into the Ray datasink. As a result,\n`table.new_batch_write_builder().overwrite({...}).new_write().write_ray(...)`\nwrote through Ray without the configured partition overwrite contract,\nwhile `overwrite\u003dTrue` only supported full-table overwrite.\n\nThis PR carries the builder static partition into `TableWrite`, forwards\nit to `PaimonDatasink`, and applies the same overwrite partition on both\nRay write tasks and the driver-side commit path."
    },
    {
      "commit": "5d4433360793ca48f34e90864983ef7c52d94bc9",
      "tree": "ab5fee507354f1398b405754e9667a9cbe9b46ac",
      "parents": [
        "5adb6c9d4cb7dfa102c4dd3da2d0e5de0674d75e"
      ],
      "author": {
        "name": "Jingsong Lee",
        "email": "jingsonglee0@gmail.com",
        "time": "Tue Jun 02 23:29:43 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 23:29:43 2026 +0800"
      },
      "message": "[core][python] Fix blob updates and compaction (#8077)\n\nFix BLOB column updates in data-evolution append tables across Java and\nPython, and make BLOB compaction handle updated multi-version BLOB\nfiles. Unchanged BLOB values are now represented with placeholders and\nresolved from older BLOB files during reads and compaction."
    },
    {
      "commit": "5adb6c9d4cb7dfa102c4dd3da2d0e5de0674d75e",
      "tree": "91e92643b39a5ab6bcfab0d61cc0b3290af25863",
      "parents": [
        "4eafdcc48ff4346609383d428d06d0db674d1ba9"
      ],
      "author": {
        "name": "Junrui Lee",
        "email": "jrlee.ljr@gmail.com",
        "time": "Tue Jun 02 23:27:27 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 23:27:27 2026 +0800"
      },
      "message": "[spark] Support ON_ERROR \u003d CONTINUE / SKIP_FILE in COPY INTO (#8062)\n\nThis is part of #8005.\n\nCOPY INTO previously only supported `ON_ERROR \u003d ABORT_STATEMENT`: any\nparse or\ncast error aborted the entire command. In production data-loading\npipelines a\nsingle malformed row or file would then fail the whole batch, which is\noften\ntoo strict. This adds two error-tolerant modes:\n\n- `CONTINUE` — skip bad rows and load the rest (row-level tolerance).\n- `SKIP_FILE` — skip any file that contains an error, all-or-nothing per\nfile.\n\n`ABORT_STATEMENT` remains the default, so existing behavior is\nunchanged."
    },
    {
      "commit": "4eafdcc48ff4346609383d428d06d0db674d1ba9",
      "tree": "ae510722918f76e2933bd299ea84ddad158a62d9",
      "parents": [
        "3b1f9cba212476060467bbf83f094aba632c4cc7"
      ],
      "author": {
        "name": "Juntao Zhang",
        "email": "juntzhang@foxmail.com",
        "time": "Tue Jun 02 23:18:12 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 23:18:12 2026 +0800"
      },
      "message": "[spark] Align spark.paimon.branch option with explicit branch syntax for chain tables (#8016)"
    },
    {
      "commit": "3b1f9cba212476060467bbf83f094aba632c4cc7",
      "tree": "77f58d7c12f0958e1c3ff135b6508ee8aff7d80e",
      "parents": [
        "0d1589a63efa7b34eb6d8336ca13fddfef97a1dd"
      ],
      "author": {
        "name": "Weitai Li",
        "email": "l8261793@gmail.com",
        "time": "Tue Jun 02 23:16:13 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 23:16:13 2026 +0800"
      },
      "message": "[spark] Support split-granularity bin packing for data evolution tables (#8072)\n\nSupport split-granularity bin packing when `data-evolution.enabled` is\nenabled.\n\nData evolution splits must remain intact. Add split-granularity bin\npacking to avoid reshuffling files across splits, while still grouping\nwhole splits by target size. Oversized splits are kept as-is."
    },
    {
      "commit": "0d1589a63efa7b34eb6d8336ca13fddfef97a1dd",
      "tree": "561b0988d079ff1283ede58692f6f5de9cecb5fe",
      "parents": [
        "79c51b92161cefb25572672eb82c052dd441125e"
      ],
      "author": {
        "name": "Junrui Lee",
        "email": "jrlee.ljr@gmail.com",
        "time": "Tue Jun 02 23:15:17 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 23:15:17 2026 +0800"
      },
      "message": "[python] Fix read crash after a column type change on non-partitioned tables (#8073)\n\nReading a non-partitioned table after changing an existing column\u0027s type\ncrashes with an Arrow schema mismatch:\n\n```\nArrowInvalid: Schema at index 1 was different:\nv: decimal128(10, 2)   (file written before the type change)\nvs\nv: decimal128(20, 2)   (file written after)\n```\n\nWhen a table has no partition keys and the read needs no column\nreordering,\n`DataFileBatchReader` returns the format reader\u0027s batch as-is, so\ncolumns from\nolder-schema files keep their original physical types. The output type\nthen\ndepends on whether the read happens to span newer-schema files, and\nconcatenation fails when it does."
    },
    {
      "commit": "79c51b92161cefb25572672eb82c052dd441125e",
      "tree": "4ae06b696a381620f828de63acaf3127b0433518",
      "parents": [
        "8faf4c6b6d2c4f7aa2e0973b600d5ccc8944ef9f"
      ],
      "author": {
        "name": "chaoyang",
        "email": "chaoyang@apache.org",
        "time": "Tue Jun 02 23:02:17 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 23:02:17 2026 +0800"
      },
      "message": "[python] Require polars \u003e\u003d 1.32 on Python 3.9+ to fix reading nested map types (#8091)"
    },
    {
      "commit": "8faf4c6b6d2c4f7aa2e0973b600d5ccc8944ef9f",
      "tree": "beac8eb313f85240e60482b2735a1248dad54650",
      "parents": [
        "e838bfcd6f47d45d4d2d3675d5b4c7194fdd425c"
      ],
      "author": {
        "name": "Junrui Lee",
        "email": "jrlee.ljr@gmail.com",
        "time": "Tue Jun 02 21:47:22 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 21:47:22 2026 +0800"
      },
      "message": "[python] Honor sequence.field on the primary-key read path (#8075)"
    },
    {
      "commit": "e838bfcd6f47d45d4d2d3675d5b4c7194fdd425c",
      "tree": "445da31ba540ad114df4f2a3f2b168fa8a3f470a",
      "parents": [
        "8d10ae5587a53df12c1dc07abc4180e2137bac9f"
      ],
      "author": {
        "name": "XiaoHongbo",
        "email": "xiaohongbo.xhb@alibaba-inc.com",
        "time": "Tue Jun 02 21:42:59 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 21:42:59 2026 +0800"
      },
      "message": "[python][ray] Reject matched updates on partitioned tables in merge_into (#8078)"
    },
    {
      "commit": "8d10ae5587a53df12c1dc07abc4180e2137bac9f",
      "tree": "1b572c818923ba3d455a871ebef181b2c8a8e540",
      "parents": [
        "0609e8670a87fc6289e195617507c70da5173544"
      ],
      "author": {
        "name": "huangxiaoping",
        "email": "1754789345@qq.com",
        "time": "Tue Jun 02 21:41:54 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 21:41:54 2026 +0800"
      },
      "message": "[docs] Fix hive migration wording (#8079)"
    },
    {
      "commit": "0609e8670a87fc6289e195617507c70da5173544",
      "tree": "9c0dbaeedbe96544bc2d6f77ffcf7267f9e83971",
      "parents": [
        "0444bdc46a1f9f41b06c4da3b128259accc3e51c"
      ],
      "author": {
        "name": "Stefanietry",
        "email": "zhou1172026225@gmail.com",
        "time": "Tue Jun 02 21:35:47 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 21:35:47 2026 +0800"
      },
      "message": "[spark] support return score on vector search (#8068)"
    },
    {
      "commit": "0444bdc46a1f9f41b06c4da3b128259accc3e51c",
      "tree": "8c37b18797792c7f160e12dd2daf4cc7a456068e",
      "parents": [
        "8c41e8baf0061806dd0abb5ef2a241afadc7f028"
      ],
      "author": {
        "name": "Stefanietry",
        "email": "zhou1172026225@gmail.com",
        "time": "Tue Jun 02 15:54:27 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 15:54:27 2026 +0800"
      },
      "message": "[hive] fix create object inspector factory for blob (#8070)"
    },
    {
      "commit": "8c41e8baf0061806dd0abb5ef2a241afadc7f028",
      "tree": "f07a18dddb789be0664389521bc272eba75bb59a",
      "parents": [
        "ed220745baea8126253c86bcbaade32753912d34"
      ],
      "author": {
        "name": "Liurnly",
        "email": "masterwangzx@gmail.com",
        "time": "Tue Jun 02 15:50:00 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 15:50:00 2026 +0800"
      },
      "message": "[spark] Add data-evolution.merge-into.file-pruning option (#8065)\n\nAdd `data-evolution.merge-into.file-pruning` for MergeInto partial\ncolumn update on data-evolution tables. When disabled, this option skips\nthe file-level pruning step. It is useful when most files in the target\npartition are expected to be updated, so the overhead of collecting\ntouched file IDs outweighs the benefit of pruning untouched files.\n\nWhen file pruning is skipped, Spark merge into still pushes down\ntarget-table partition filters from the MERGE ON condition to avoid\nscanning unrelated partitions."
    },
    {
      "commit": "ed220745baea8126253c86bcbaade32753912d34",
      "tree": "f1b45eaf5a39eea95e20dcb02eb7cfdbb595cec3",
      "parents": [
        "cf9cdf9cb6772b6dcd03bb09c98b9a476adbf00b"
      ],
      "author": {
        "name": "QuakeWang",
        "email": "45645138+QuakeWang@users.noreply.github.com",
        "time": "Tue Jun 02 14:55:23 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 14:55:23 2026 +0800"
      },
      "message": "[python][ray] Reject unsafe primary-key writes (#8071)\n\nRay writes create one independent Paimon writer per write task. For\nprimary-key tables, this is unsafe unless all rows for each `(partition,\nbucket)` are routed to one writer.\n\nThe previous Ray path only guarded HASH_FIXED primary-key tables in\ndefault/off mode. Default primary-key tables use dynamic buckets\n(`bucket \u003d -1`), so they were written as-is. Multiple Ray tasks could\nthen assign buckets and sequence numbers from independent local state,\nproducing duplicate keys with overlapping `_SEQUENCE_NUMBER`.\n\nThis PR makes Ray primary-key writes fail fast unless the table is\nHASH_FIXED and `hash_fixed_precluster\u003d\"map_groups\"` is selected.\nAppend-only non-HASH_FIXED tables still pass through unchanged. The\nPyPaimon Ray docs are updated to match the new behavior."
    },
    {
      "commit": "cf9cdf9cb6772b6dcd03bb09c98b9a476adbf00b",
      "tree": "dab8b07fdcfd284a02c178318a0d44005b7195c0",
      "parents": [
        "58ed52a42c5b43d1ca6f9c23f3b4a5c6b5328949"
      ],
      "author": {
        "name": "Jingsong Lee",
        "email": "jingsonglee0@gmail.com",
        "time": "Tue Jun 02 12:17:04 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 02 12:17:04 2026 +0800"
      },
      "message": "[core] Optimize deletion vector index scan (#8066)\n\nOptimize deletion vector index scanning during snapshot planning by\navoiding unnecessary manifest reads, limiting scans when no DV meta\ncache is available, and lazily warming the DV meta cache when it is\navailable."
    },
    {
      "commit": "58ed52a42c5b43d1ca6f9c23f3b4a5c6b5328949",
      "tree": "023d2161ead18ec8ee668d3aea2f4314a37b5848",
      "parents": [
        "cf1d2ce05c5b8c9afc00fc31a7c170c770b9a402"
      ],
      "author": {
        "name": "QuakeWang",
        "email": "45645138+QuakeWang@users.noreply.github.com",
        "time": "Mon Jun 01 22:04:06 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 22:04:06 2026 +0800"
      },
      "message": "[python][daft] Preserve pushed filters across source serialization (#8061)\n\n`PaimonDataSource.__setstate__()` restores pushed filter state before\nreopening the table, but `_init_table()` then reset `_pushed_filters`,\n`_paimon_predicate`, and `_remaining_filters` to `None`.\n\nThis made serialized Daft sources lose filters already accepted by\n`push_filters()`. In fallback reads, `get_tasks(Pushdowns(filters\u003dNone,\nlimit\u003d1))` could then plan an unfiltered limited read and return the\nwrong row.\n\nThis PR moves pushdown state initialization to `__init__()`, keeps\n`_init_table()` limited to table-derived metadata, and includes\n`_pushed_filters` in the serialized state for consistent explain/debug\noutput."
    },
    {
      "commit": "cf1d2ce05c5b8c9afc00fc31a7c170c770b9a402",
      "tree": "97724c23fd265596f7fe8d9e587ebfd6b3cea6c6",
      "parents": [
        "0978e4c17512a6bbf441801ae1a3f9a83bf31eb5"
      ],
      "author": {
        "name": "XiaoHongbo",
        "email": "xiaohongbo.xhb@alibaba-inc.com",
        "time": "Mon Jun 01 22:01:59 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 22:01:59 2026 +0800"
      },
      "message": "[filesystem] Fix hadoop uber shaded dependencies (#8054)\n\n Fix `paimon-hadoop-uber` to make it a self-contained Hadoop uber jar.\n\n`paimon-hadoop-uber` is intended for applications without Hadoop\ndependencies. However, after #6327\nnarrowed the shade includes, the internal `paimon-hadoop-shaded`\nartifact was not included in the final\n  jar.\n\n  This causes:\n\n1. Maven/Gradle dependency resolution failure, because the published POM\nreferences `paimon-hadoop-\n  shaded`, which is intentionally not deployed.\n2. Runtime failure when users directly put `paimon-hadoop-uber.jar` on\nthe classpath, for example:\n\n  ```text\n  java.lang.NoClassDefFoundError: com/ctc/wstx/io/InputBootstrapper\n```\n\n  This patch includes org.apache.paimon:paimon-hadoop-shaded in the paimon-hadoop-uber shade artifact set."
    },
    {
      "commit": "0978e4c17512a6bbf441801ae1a3f9a83bf31eb5",
      "tree": "41b4c6ff68f2282693adca757ac7ef34f94cbdfa",
      "parents": [
        "2484af7e072548427266991fe3ccc738937682e6"
      ],
      "author": {
        "name": "XiaoHongbo",
        "email": "xiaohongbo.xhb@alibaba-inc.com",
        "time": "Mon Jun 01 21:54:43 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 21:54:43 2026 +0800"
      },
      "message": "[ray] Introduce Ray Data merge into (#8028)"
    },
    {
      "commit": "2484af7e072548427266991fe3ccc738937682e6",
      "tree": "f824f181639ee8c9225a8bea66d6722896c92341",
      "parents": [
        "8ad391f4fce57569afb6fe7eabae90085d0de99e"
      ],
      "author": {
        "name": "wangwj",
        "email": "hongli.wwj@gmail.com",
        "time": "Mon Jun 01 20:53:08 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 20:53:08 2026 +0800"
      },
      "message": "[Flink] Fix ReadOperator to stop reading after LIMIT on dedicated split path. (#7991)"
    },
    {
      "commit": "8ad391f4fce57569afb6fe7eabae90085d0de99e",
      "tree": "f98367d507649d6af7a2c39a05948dfcb25b164c",
      "parents": [
        "6bb161b5d6f209e82438ebb2da2d21be08835f37"
      ],
      "author": {
        "name": "chaoyang",
        "email": "chaoyang@apache.org",
        "time": "Mon Jun 01 19:32:17 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 19:32:17 2026 +0800"
      },
      "message": "[python] Implement aggregation merge engine in pypaimon (#7952)\n\nPort Java\u0027s `AggregateMergeFunction` to pypaimon, following the shape of\n#7745.\n\nShips the `FieldAggregator` framework, 9 value aggregators (`sum` /\n`max` / `min` / `last_value` / `last_non_null_value` / `first_value` /\n`first_non_null_value` / `bool_or` / `bool_and`) plus the `primary_key`\nplaceholder, the `AggregateMergeFunction` wired into\n`MergeFileSplitRead._build_merge_function`, and a `merge_engine_support`\nguard that rejects retract opt-ins, sequence fields and out-of-scope\naggregator identifiers (`collect` / `nested_update` / `theta_sketch` /\n`roaring_bitmap_*` / ...).\n\nRetract handling (DELETE / UPDATE_BEFORE) and the remaining 14 Java\naggregators are intentionally deferred to follow-up PRs, mirroring\n#7745\u0027s scoping."
    },
    {
      "commit": "6bb161b5d6f209e82438ebb2da2d21be08835f37",
      "tree": "20595d30d91b0c3a79b5812335da6235b788db41",
      "parents": [
        "1c8ba39b1de559601ec64f48c692bac56941e635"
      ],
      "author": {
        "name": "Stefanietry",
        "email": "zhou1172026225@gmail.com",
        "time": "Mon Jun 01 17:49:24 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 17:49:24 2026 +0800"
      },
      "message": "[core] support vector on spark (#8019)"
    },
    {
      "commit": "1c8ba39b1de559601ec64f48c692bac56941e635",
      "tree": "cce8f955befba3def1e2d7a3f7caea7f886c51f8",
      "parents": [
        "c73e287aaaeaea926a260ebb0dc6e8643e0b39bc"
      ],
      "author": {
        "name": "XiaoHongbo",
        "email": "xiaohongbo.xhb@alibaba-inc.com",
        "time": "Mon Jun 01 11:47:30 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 11:47:30 2026 +0800"
      },
      "message": "[python] Fix crash when updating all columns via update_by_arrow_with_row_id (#8043)\n\nUpdating **all columns** of a row via `update_by_arrow_with_row_id`\ncrashes with `ValueError: column_names cannot be empty`. This PR fixes\nit by resolving update columns from data when update type is unset."
    },
    {
      "commit": "c73e287aaaeaea926a260ebb0dc6e8643e0b39bc",
      "tree": "967e1bb317abd2ce2d21913a1c42a4c550e313b5",
      "parents": [
        "86a3af047c42c7c84bdef0ca6ae96bf4085e21c9"
      ],
      "author": {
        "name": "Zouxxyy",
        "email": "zouxinyu.zxy@alibaba-inc.com",
        "time": "Mon Jun 01 11:46:58 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 11:46:58 2026 +0800"
      },
      "message": "[spark] Decouple type-widening from merge-schema with an explicit switch (#8042)\n\nToday `write.merge-schema` couples column-addition with *unconditional*\ntype widening. That has two problems: it can attempt unsupported\nwidenings (e.g. `ARRAY\u003cINT\u003e` -\u003e `ARRAY\u003cBIGINT\u003e`) on a plain\ncolumn-addition write, and the widening behavior is inconsistent between\ncatalog writes and `MERGE INTO`.\n\nThis PR decouples the two with an explicit, opt-in switch:\n\n| Option | Default | Effect |\n|--------|---------|--------|\n| `write.merge-schema` | false | Evolve schema for **new columns only**;\nexisting column types are kept and incoming values are cast to them. |\n| `write.merge-schema.type-widening` | false | Additionally widen an\nexisting column type to a wider compatible type (e.g. `INT -\u003e BIGINT`,\n`DECIMAL` precision increase). |\n| `write.merge-schema.explicit-cast` | false | Additionally allow lossy\ncasts (e.g. `BIGINT -\u003e INT`, `STRING -\u003e DATE`). |\n\nTwo things worth calling out:\n\n1. **The write flow is `compute -\u003e cast -\u003e commit`.** For catalog writes\n(`saveAsTable` / `INSERT` / `writeTo`) and `MERGE INTO`, analysis first\ncomputes the merged (evolved) schema **in memory** (nothing is persisted\nyet) and casts the incoming data to it; the evolved schema is persisted\nonly by the final commit, which is **deferred to execution**.\n\n2. **Schema evolution on path-based DataFrame write is intentional.**\n`df.write.format(\"paimon\").save(path)` and the streaming sink also\nevolve the schema. They have no analysis-time cast hook, so they commit\nthe evolved schema and write the data as-is (works for column additions\nand widening, where the incoming data already matches the evolved\ntypes). This is by design, and is covered by the added\n`DataFrameWriteTest` / `PaimonSinkTest` cases."
    },
    {
      "commit": "86a3af047c42c7c84bdef0ca6ae96bf4085e21c9",
      "tree": "27d287ce6e7beb800bed788460037d9a51f8e6b6",
      "parents": [
        "6fd341250ab9852178144eeec4275ec19f448c8f"
      ],
      "author": {
        "name": "Kerwin Zhang",
        "email": "xiyu.zk@alibaba-inc.com",
        "time": "Mon Jun 01 11:28:55 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 11:28:55 2026 +0800"
      },
      "message": "[python][daft] Make Daft Paimon read source serializable (#8029)"
    },
    {
      "commit": "6fd341250ab9852178144eeec4275ec19f448c8f",
      "tree": "fad24d3427b3ab46d0341209515153e37f0335ee",
      "parents": [
        "5b654cae4da36c4f6b506df9e225f1ebae9b799d"
      ],
      "author": {
        "name": "QuakeWang",
        "email": "45645138+QuakeWang@users.noreply.github.com",
        "time": "Mon Jun 01 11:07:42 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 11:07:42 2026 +0800"
      },
      "message": "[python][daft] Push down supported NOT predicates (#8059)\n\nDaft predicate conversion previously treated every `NOT` expression as\nunsupported, so filters such as `NOT (col \u003d\u003d lit)`, `NOT\ncol.is_in(...)`, and `NOT col.is_null()` stayed above the datasource\neven though pypaimon already has matching predicate types.\n\nThis change adds a narrow leaf-only rewrite for supported `NOT`\npredicates:\n- `NOT equal` -\u003e `notEqual`\n- `NOT in` -\u003e `notIn`\n- `NOT between` -\u003e `notBetween`\n- `NOT isNull` -\u003e `isNotNull`\n- `NOT isNotNull` -\u003e `isNull`\n\nIt intentionally does not add De Morgan rewrites for compound\nexpressions.\n\nThe change also fixes `notIn` Arrow evaluation to exclude null values.\nWithout this, fallback datasource tasks could return null rows for\npushed `NOT IN` filters, and pushed limits could then truncate valid\nrows."
    },
    {
      "commit": "5b654cae4da36c4f6b506df9e225f1ebae9b799d",
      "tree": "72e70a0d3ae81122acadd2f97673b66340966772",
      "parents": [
        "08b612b3cb0d9a6d118aeeb7dd6d4c9c8c44d893"
      ],
      "author": {
        "name": "Arnav Balyan",
        "email": "60175178+ArnavBalyan@users.noreply.github.com",
        "time": "Mon Jun 01 06:28:24 2026 +0530"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 08:58:24 2026 +0800"
      },
      "message": "[core] Disable snapshot expiry in testDiscardDuplicateFilesMultiThread (#8058)"
    },
    {
      "commit": "08b612b3cb0d9a6d118aeeb7dd6d4c9c8c44d893",
      "tree": "a527e9fa5de4f738be5a1e6debc312237707a974",
      "parents": [
        "4c5844c9decd597b8135e90710305a264b41a17e"
      ],
      "author": {
        "name": "XiaoHongbo",
        "email": "xiaohongbo.xhb@alibaba-inc.com",
        "time": "Mon Jun 01 08:56:50 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 01 08:56:50 2026 +0800"
      },
      "message": "[python] Remove duplicate requests dependency (#8056)"
    },
    {
      "commit": "4c5844c9decd597b8135e90710305a264b41a17e",
      "tree": "f6ed832d459be67063274cb8b1bc2c3cd5e9b92d",
      "parents": [
        "72027ef95153588b519151fe31835da2ebfcda91"
      ],
      "author": {
        "name": "QuakeWang",
        "email": "45645138+QuakeWang@users.noreply.github.com",
        "time": "Sun May 31 23:18:13 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun May 31 23:18:13 2026 +0800"
      },
      "message": "[python][daft] Split conjunctive filters for pushdown (#8055)\n\nDaft Paimon filter conversion previously treated each filter expression\nas all-or-nothing. For an expression like `A AND unsupported(B)`, the\nsupported conjunct `A` was not pushed to Paimon, so it could not\nparticipate in scan planning or file skipping.\n\nThis PR flattens only `AND` conjuncts in the Daft adapter layer and\nconverts each conjunct independently. Supported conjuncts are combined\ninto a Paimon `AND` predicate, while unsupported conjuncts remain as\nDaft post-scan filters. `OR` expressions are still pushed only when the\nwhole `OR` is supported."
    },
    {
      "commit": "72027ef95153588b519151fe31835da2ebfcda91",
      "tree": "c748f2a40481b37dd171060ac1ecde67edcb7af9",
      "parents": [
        "3336f14dcb5f6f6ab74578ecac21e27398fc6a4e"
      ],
      "author": {
        "name": "XiaoHongbo",
        "email": "xiaohongbo.xhb@alibaba-inc.com",
        "time": "Sun May 31 22:55:02 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun May 31 22:55:02 2026 +0800"
      },
      "message": "[python] Fix missing HTTP runtime dependencies (#8044)"
    },
    {
      "commit": "3336f14dcb5f6f6ab74578ecac21e27398fc6a4e",
      "tree": "2a45c695666d8f9702e8a66649779b2c0a729455",
      "parents": [
        "2b46053b5acc4bcd7cb845a6bb1a5e18429c52b5"
      ],
      "author": {
        "name": "Liurnly",
        "email": "masterwangzx@gmail.com",
        "time": "Sun May 31 22:51:23 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun May 31 22:51:23 2026 +0800"
      },
      "message": "[spark] Avoid serializing known splits in read builder (#8050)"
    },
    {
      "commit": "2b46053b5acc4bcd7cb845a6bb1a5e18429c52b5",
      "tree": "5bacd05f626381859f1ececb524addfcd637c7b9",
      "parents": [
        "3c2be085b6d1d4333b691547e55ed1aadefec87e"
      ],
      "author": {
        "name": "Liurnly",
        "email": "masterwangzx@gmail.com",
        "time": "Sun May 31 21:46:15 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun May 31 21:46:15 2026 +0800"
      },
      "message": "[spark] Broadcast data evolution firstRowIdToPartitionMap (#8051)\n\n`firstRowIdToPartitionMap` is used by data evolution MERGE INTO writes\nto locate the original partition and row count for each data file from\nits `firstRowId`. This mapping is built from scanned `DataSplit`s and\nthen used by each Spark task while writing evolved rows.\n\nFor merge plans that affect many files across many partitions, the map\ncan become large. Capturing it directly in every task closure duplicates\nthe serialized map for each task and can make Spark task serialization\nvery large."
    },
    {
      "commit": "3c2be085b6d1d4333b691547e55ed1aadefec87e",
      "tree": "97144f763e3e7f3ce428243ab87deac7e7b9cc12",
      "parents": [
        "dd79ff78c51b9c040196f7841d4837d2e2ed87f7"
      ],
      "author": {
        "name": "YeJunHao",
        "email": "41894543+leaves12138@users.noreply.github.com",
        "time": "Sun May 31 21:35:41 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun May 31 21:35:41 2026 +0800"
      },
      "message": "[core] Remove debug print from compact buckets table (#8020)"
    },
    {
      "commit": "dd79ff78c51b9c040196f7841d4837d2e2ed87f7",
      "tree": "d286241963474af8fc3a5f29d11389db51605170",
      "parents": [
        "c38d3e86b8171b36c6d809f4370db2428cc1fac7"
      ],
      "author": {
        "name": "AN Long",
        "email": "aisk@users.noreply.github.com",
        "time": "Sun May 31 22:32:28 2026 +0900"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun May 31 21:32:28 2026 +0800"
      },
      "message": "[python] Add requests as dependency (#7960)\n\n### Purpose\n\nCurrently requests is required to using Paimon, otherwise it will raise\nerror when importing `paimon` on current main branch:\n\n```\n\u003e\u003e\u003e import pypaimon\nTraceback (most recent call last):\n  File \"\u003cpython-input-0\u003e\", line 1, in \u003cmodule\u003e\n    import pypaimon\n  File \"/home/asaka/paimon/paimon-python/pypaimon/__init__.py\", line 26, in \u003cmodule\u003e\n    from pypaimon.catalog.catalog_factory import CatalogFactory\n  File \"/home/asaka/paimon/paimon-python/pypaimon/catalog/catalog_factory.py\", line 23, in \u003cmodule\u003e\n    from pypaimon.catalog.filesystem_catalog import FileSystemCatalog\n  File \"/home/asaka/paimon/paimon-python/pypaimon/catalog/filesystem_catalog.py\", line 45, in \u003cmodule\u003e\n    from pypaimon.table.file_store_table import FileStoreTable\n  File \"/home/asaka/paimon/paimon-python/pypaimon/table/file_store_table.py\", line 25, in \u003cmodule\u003e\n    from pypaimon.read.read_builder import ReadBuilder\n  File \"/home/asaka/paimon/paimon-python/pypaimon/read/read_builder.py\", line 20, in \u003cmodule\u003e\n    from pypaimon.common.predicate import Predicate\n  File \"/home/asaka/paimon/paimon-python/pypaimon/common/predicate.py\", line 29, in \u003cmodule\u003e\n    from pypaimon.manifest.schema.simple_stats import SimpleStats\n  File \"/home/asaka/paimon/paimon-python/pypaimon/manifest/schema/simple_stats.py\", line 22, in \u003cmodule\u003e\n    from pypaimon.table.row.generic_row import GenericRow\n  File \"/home/asaka/paimon/paimon-python/pypaimon/table/row/generic_row.py\", line 28, in \u003cmodule\u003e\n    from pypaimon.table.row.blob import BlobData\n  File \"/home/asaka/paimon/paimon-python/pypaimon/table/row/blob.py\", line 24, in \u003cmodule\u003e\n    from pypaimon.common.uri_reader import UriReader, FileUriReader\n  File \"/home/asaka/paimon/paimon-python/pypaimon/common/uri_reader.py\", line 23, in \u003cmodule\u003e\n    import requests\nModuleNotFoundError: No module named \u0027requests\u0027\n```\n\nWith the latest release of `pypaimon` on PyPI, we can import `pypaimon`,\nbut `import CatalogFactory from pypaimon` will raise the same error.\n\nAdding `requests` to `requirements.txt` can resolve this error.\n`requests 2.21.0` was released in 2018, I think it\u0027s old enough to be\nused as the minimum version range.\n\n### Tests"
    },
    {
      "commit": "c38d3e86b8171b36c6d809f4370db2428cc1fac7",
      "tree": "fa0a80eade1fef3ccbcecc6ff8558056bcbf9be0",
      "parents": [
        "07aa62d25bd2671265e58f73fcc20266e45e102c"
      ],
      "author": {
        "name": "Arnav Balyan",
        "email": "60175178+ArnavBalyan@users.noreply.github.com",
        "time": "Sun May 31 19:01:46 2026 +0530"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun May 31 21:31:46 2026 +0800"
      },
      "message": "[Schema] Fix catalog getTable failing for bucketed append tables created with Paimon 0.7 (#8025)"
    },
    {
      "commit": "07aa62d25bd2671265e58f73fcc20266e45e102c",
      "tree": "61e79e28d9da0e4df1d9d12783342f0b995c71a4",
      "parents": [
        "23fbf740dca4b11c42fc6a6fa70aef055d16155c"
      ],
      "author": {
        "name": "Arnav Balyan",
        "email": "60175178+ArnavBalyan@users.noreply.github.com",
        "time": "Sun May 31 19:01:23 2026 +0530"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun May 31 21:31:23 2026 +0800"
      },
      "message": "[Table] Fix copyWithLatestSchema option drop after table was opened (#8026)"
    },
    {
      "commit": "23fbf740dca4b11c42fc6a6fa70aef055d16155c",
      "tree": "6ca6abc0c20baa75c38a25157746438435e6dd22",
      "parents": [
        "11e899f000e8b4d2b16ed1c378442355c497d4a4"
      ],
      "author": {
        "name": "kevin",
        "email": "38326692+qingwei727@users.noreply.github.com",
        "time": "Sun May 31 21:30:44 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun May 31 21:30:44 2026 +0800"
      },
      "message": "[rest] Remove renameBranch REST endpoint (#8023)\n\nThe server-side implementation of `renameBranch` ultimately relies on\nthe underlying object store\u0027s rename (e.g. OSS), which is **not atomic**.\nWhile the rename is in progress, concurrent writes to the source branch can be\nsilently lost, and there is currently no reliable way for the server to fence those writes during the operation.\n\nBecause of this consistency risk, we\u0027d rather not offer `renameBranch`\nover the REST catalog at all than offer it with a known data-loss window.\nThis PR retires only the REST surface — the `Catalog` interface and the\nnon-REST implementations (file-system branch manager, Spark/Flink procedures,\netc.) are intentionally left untouched, so users on those code paths are not affected."
    },
    {
      "commit": "11e899f000e8b4d2b16ed1c378442355c497d4a4",
      "tree": "6ffc878375d91dca990a9619ea889bea605846fd",
      "parents": [
        "d7efb485294f8de43cbf7252d26e7ec7380f6732"
      ],
      "author": {
        "name": "wangwj",
        "email": "hongli.wwj@gmail.com",
        "time": "Sun May 31 21:29:31 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun May 31 21:29:31 2026 +0800"
      },
      "message": "[core] Fix ApplyBitmapIndexRecordReader to stop readBatch after bitmap selection is exhausted. (#7994)"
    },
    {
      "commit": "d7efb485294f8de43cbf7252d26e7ec7380f6732",
      "tree": "2f26bae51cf589420740fff33b3e65e01149a410",
      "parents": [
        "14cf97d3f52c83b589fd2192c78857b2e14d83fb"
      ],
      "author": {
        "name": "Jingsong Lee",
        "email": "jingsonglee0@gmail.com",
        "time": "Sun May 31 21:27:37 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun May 31 21:27:37 2026 +0800"
      },
      "message": "[project] Add security threat model and security docs page (#8038)"
    },
    {
      "commit": "14cf97d3f52c83b589fd2192c78857b2e14d83fb",
      "tree": "3e801f5c1cacdb93bb697014ae2b97db5eaca3cf",
      "parents": [
        "d50becf97d0ef1e9633978cdf4cae2c5c1387120"
      ],
      "author": {
        "name": "QuakeWang",
        "email": "45645138+QuakeWang@users.noreply.github.com",
        "time": "Sun May 31 21:24:39 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun May 31 21:24:39 2026 +0800"
      },
      "message": "[python] Make HASH_FIXED Ray pre-clustering opt-in (#8046)"
    },
    {
      "commit": "d50becf97d0ef1e9633978cdf4cae2c5c1387120",
      "tree": "ecc3fab95155dfface7b7928785dbf4ed2e60dc2",
      "parents": [
        "cc1bfae3b91ecfba6b5f140da2adf883ea9c7053"
      ],
      "author": {
        "name": "XiaoHongbo",
        "email": "xiaohongbo.xhb@alibaba-inc.com",
        "time": "Sun May 31 21:23:47 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun May 31 21:23:47 2026 +0800"
      },
      "message": "[python] Reject or drop stale index when updating globally indexed columns (#8045)"
    },
    {
      "commit": "cc1bfae3b91ecfba6b5f140da2adf883ea9c7053",
      "tree": "21b05378d8b69e76c8f2121cc9aaffb88c65c434",
      "parents": [
        "b93b6b5bfb6768fdfc1720cff5afa25c5643bf34"
      ],
      "author": {
        "name": "Kerwin Zhang",
        "email": "xiyu.zk@alibaba-inc.com",
        "time": "Sun May 31 21:23:15 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun May 31 21:23:15 2026 +0800"
      },
      "message": "[core][spark] Use DV-aware tight bounds for Spark MIN/MAX pushdown (#8047)\n\nSpark currently disables MIN/MAX aggregate pushdown for any table with\n`deletion-vectors.enabled\u003dtrue`. This is correct but too conservative:\nmany DV-enabled non-primary-key tables, or many splits inside them, do\nnot actually have deleted rows. In those cases the recorded file min/max\nstats are still tight and can safely answer MIN/MAX.\n\nThis PR makes the decision based on runtime split metadata instead of\nthe table-level DV option. It derives whether a data file still has\ntight stats from `DataFileMeta.deleteRowCount` and the paired\n`DeletionFile.cardinality`, then allows Spark MIN/MAX pushdown only when\nevery file in the split is tight.\n\nThis keeps the existing safety behavior for files with real deletes or\nunknown DV cardinality, while recovering MIN/MAX pushdown for DV-enabled\ntables/splits that have no effective deletions."
    },
    {
      "commit": "b93b6b5bfb6768fdfc1720cff5afa25c5643bf34",
      "tree": "15f7454598536c29c082893fd1f495ebed850d5a",
      "parents": [
        "0a9a930f6cb7bd8b919d324dc117f264caaef3ba"
      ],
      "author": {
        "name": "AuroraVoyage",
        "email": "83496233+PyRSA@users.noreply.github.com",
        "time": "Sun May 31 21:21:47 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun May 31 21:21:47 2026 +0800"
      },
      "message": "[core] Support nested sequence fields in FieldNestedUpdateAgg operator (#8048)"
    },
    {
      "commit": "0a9a930f6cb7bd8b919d324dc117f264caaef3ba",
      "tree": "530421d26e70ab60b4c3fe63da5b7c23adab5300",
      "parents": [
        "609745ec12d6f10de1c0f41d0931d7afbfc0c23f"
      ],
      "author": {
        "name": "QuakeWang",
        "email": "45645138+QuakeWang@users.noreply.github.com",
        "time": "Sun May 31 20:41:06 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun May 31 20:41:06 2026 +0800"
      },
      "message": "[python][daft] Add Daft-side scan explain diagnostics (#8017)\n\nDaft\u0027s Paimon reader already chooses between native Parquet reads and\npypaimon fallback internally, but that routing decision was not\nobservable from the public Paimon Daft API. `ReadBuilder.explain()` only\ndescribes the Paimon scan plan, so users could not diagnose whether a\nslow scan was caused by PK merge, deletion vectors, BLOB columns,\nnon-Parquet format, or pushdown behavior.\n\nThis PR adds a structured Daft-side scan explain API:\n\n  - `explain_paimon_scan(...)`\n  - `PaimonTable.explain_scan(...)`\n\nThe result includes the underlying Paimon scan explain plus Daft reader\nrouting details: native/fallback split and file counts, fallback\nreasons, pushed/remaining filters, projection/limit pushdown status, and\noptional per-split reader mode.\n\nThe implementation reuses the same scan builder, partition filtering,\nand native/fallback routing helpers used by\n`PaimonDataSource.get_tasks()` to avoid divergence between diagnostics\nand actual execution."
    },
    {
      "commit": "609745ec12d6f10de1c0f41d0931d7afbfc0c23f",
      "tree": "38b69ddc368779497ceb54828a8f81bc2708231b",
      "parents": [
        "4bb0871550ac9983146bea0f5da8ad5ba5dbf7a1"
      ],
      "author": {
        "name": "Jingsong Lee",
        "email": "jingsonglee0@gmail.com",
        "time": "Sun May 31 15:09:27 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun May 31 15:09:27 2026 +0800"
      },
      "message": "[vortex] Refactor JNI layer to use DataSource/Session API replacing proto-based Array/DType (#8040)\n\nReplace the old protobuf-based JNI bindings (Array, DType, File,\nexpressions package) with a new DataSource/Session/Scan/Partition API that provides a cleaner\nabstraction for Vortex file access. Update VortexRecordsReader, VortexRecordsWriter,\nand VortexPredicateConverter to work with the new JNI interface."
    },
    {
      "commit": "4bb0871550ac9983146bea0f5da8ad5ba5dbf7a1",
      "tree": "59f7714168f733b495f185a57817543fb85d0523",
      "parents": [
        "bcdd0d6f6953e71406a234e26fc132c7c65f7167"
      ],
      "author": {
        "name": "cxzl25",
        "email": "3898450+cxzl25@users.noreply.github.com",
        "time": "Sun May 31 10:23:15 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun May 31 10:23:15 2026 +0800"
      },
      "message": "[hive] Add hive.skip-update-stats config option to control DO_NOT_UPDATE_STATS in HiveCatalog alter table (#7686)"
    },
    {
      "commit": "bcdd0d6f6953e71406a234e26fc132c7c65f7167",
      "tree": "40e30e80f88dc5ed4c30c5cb3cac7ed762d034cd",
      "parents": [
        "53d3fa6443a7c7cd7b0e6dfccd6332fc7f14faaf"
      ],
      "author": {
        "name": "QuakeWang",
        "email": "45645138+QuakeWang@users.noreply.github.com",
        "time": "Sun May 31 10:16:48 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun May 31 10:16:48 2026 +0800"
      },
      "message": "[python] Fix empty Ray overwrite commit (#8041)\n\n`PaimonDatasink.on_write_complete()` previously returned early when all\nRay write tasks produced empty commit messages. This made\n`write_paimon(empty_ds, table, overwrite\u003dTrue)` a silent no-op, so\nexisting data was kept even though Paimon overwrite commit semantics\nrequire empty overwrite commits to reach `TableCommit`.\n\nThis PR keeps the empty-message fast path only for append writes. For\noverwrite writes, Ray now calls `TableCommit.commit([])`, allowing\n`FileStoreCommit.overwrite()` to delete the target range for\nstatic/unpartitioned overwrite and preserve dynamic-partition empty\noverwrite no-op semantics."
    },
    {
      "commit": "53d3fa6443a7c7cd7b0e6dfccd6332fc7f14faaf",
      "tree": "df4dd71c94cc8bc7e9cbcfa2b44baa15417fe179",
      "parents": [
        "ca6e718d5a77bbbea3ffa3643537cb0306a9dfa3"
      ],
      "author": {
        "name": "QuakeWang",
        "email": "45645138+QuakeWang@users.noreply.github.com",
        "time": "Sat May 30 23:13:36 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sat May 30 23:13:36 2026 +0800"
      },
      "message": "[python][daft] Fix single-part catalog table lookup (#8039)\n\nDaft Catalog accepts single-part identifiers, and PaimonCatalog\npreviously forwarded them to pypaimon as plain table names. For table\nAPIs, this is invalid because pypaimon\u0027s Identifier parser\nrequires `db.table`, so `get_table(\"missing_table\")` and\n`drop_table(\"missing_table\")` leaked `ValueError` instead of Daft\n`NotFoundError`.\n\nThis PR adds table-specific identifier handling in `PaimonCatalog`:\nsingle-part table identifiers are handled at the Daft catalog layer as\nnot found, and `has_table(\"missing_table\")` now returns `False`."
    },
    {
      "commit": "ca6e718d5a77bbbea3ffa3643537cb0306a9dfa3",
      "tree": "ca87b86d770eae2d152db115acb197cc198c2cfe",
      "parents": [
        "c3504e26d4b43afcfe8bbe2557661db276c850c8"
      ],
      "author": {
        "name": "Junrui Lee",
        "email": "jrlee.ljr@gmail.com",
        "time": "Sat May 30 12:03:35 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sat May 30 12:03:35 2026 +0800"
      },
      "message": "[spark] Support Parquet format in COPY INTO (#8037)"
    },
    {
      "commit": "c3504e26d4b43afcfe8bbe2557661db276c850c8",
      "tree": "8424da5051573cfb25b968c2708c2a26ad358884",
      "parents": [
        "a40998ce600b544b1ac490849ff1f2718ea73634"
      ],
      "author": {
        "name": "Junrui Lee",
        "email": "jrlee.ljr@gmail.com",
        "time": "Sat May 30 09:38:32 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sat May 30 09:38:32 2026 +0800"
      },
      "message": "[python] Implement first-row merge engine for read path (#7968)\n\nAdd read-path support for the `first-row` merge engine in pypaimon. The\nfirst-row engine keeps only the earliest row per primary key, which is\nthe opposite of the default `deduplicate` engine that keeps the latest.\n\nPreviously, reading a table configured with `merge-engine: first-row`\nraised `NotImplementedError`. This PR implements the merge function and\nwires it into the read pipeline."
    },
    {
      "commit": "a40998ce600b544b1ac490849ff1f2718ea73634",
      "tree": "258ec0cf751d479fa656d9ff02363a233a371a58",
      "parents": [
        "7250faca063404c5724136eaef9a44817b376944"
      ],
      "author": {
        "name": "JingsongLi",
        "email": "jingsonglee0@gmail.com",
        "time": "Sat May 30 09:36:22 2026 +0800"
      },
      "committer": {
        "name": "JingsongLi",
        "email": "jingsonglee0@gmail.com",
        "time": "Sat May 30 09:36:22 2026 +0800"
      },
      "message": "[test] Fix unstable test: MySqlSyncDatabaseActionITCase\n"
    },
    {
      "commit": "7250faca063404c5724136eaef9a44817b376944",
      "tree": "52e5d3e2ff1ba30c70d14ec27854b7cb194721b8",
      "parents": [
        "1f9dba508fecf452d4fb036d3d98767332cf6e55"
      ],
      "author": {
        "name": "JingsongLi",
        "email": "jingsonglee0@gmail.com",
        "time": "Fri May 29 22:32:24 2026 +0800"
      },
      "committer": {
        "name": "JingsongLi",
        "email": "jingsonglee0@gmail.com",
        "time": "Fri May 29 22:33:13 2026 +0800"
      },
      "message": "[docs] Restructure table docs: rename sections and add Multimodal Table\n\n- Rename \u0027Table with PK\u0027 to \u0027PrimaryKey Table\u0027\n- Rename \u0027Table w/o PK\u0027 to \u0027Append Table\u0027\n- Rename \u0027Bucketed\u0027 to \u0027Bucketed Append\u0027\n- Extract Data Evolution, Blob Storage, Vector Storage, Global Index\n  into a new \u0027Multimodal Table\u0027 section\n- Reorder sidebar: Append Table before PrimaryKey Table\n- Move Concepts to hero links on homepage, add Spark card\n- Update all cross-references and add redirects\n"
    },
    {
      "commit": "1f9dba508fecf452d4fb036d3d98767332cf6e55",
      "tree": "c165f459d00be4ccea4c5f03b8cea6bdcd018528",
      "parents": [
        "93d449e65046091a636dfa0002daeb95627d9f0b"
      ],
      "author": {
        "name": "huangxiaoping",
        "email": "1754789345@qq.com",
        "time": "Fri May 29 22:12:31 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 22:12:31 2026 +0800"
      },
      "message": "[core] Improve sort buffer overflow message (#7995)\n\nImprove the error message thrown when a single record exceeds the\navailable sort buffer in `BinaryExternalSortBuffer`, so users can more\neasily understand that the failure is caused by an oversized row and\nknow to check the input data or increase `write-buffer-size`."
    },
    {
      "commit": "93d449e65046091a636dfa0002daeb95627d9f0b",
      "tree": "7af013659db7e90744e5995c9bb320550ae2a6ab",
      "parents": [
        "e8dd5d3cfdd29661da5152b44b972fbf3859ac9c"
      ],
      "author": {
        "name": "Kerwin Zhang",
        "email": "xiyu.zk@alibaba-inc.com",
        "time": "Fri May 29 21:50:25 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 21:50:25 2026 +0800"
      },
      "message": "[spark] Refactor SQL extensions parser into base + per-version wrappers (#8032)"
    },
    {
      "commit": "e8dd5d3cfdd29661da5152b44b972fbf3859ac9c",
      "tree": "e3e9b9bfa15438ad9f25328214c782a274248d83",
      "parents": [
        "22f3fd344cb86bb39d8149b764109cdb5c24a70c"
      ],
      "author": {
        "name": "Faiz",
        "email": "wxy407679@antgroup.com",
        "time": "Fri May 29 21:49:12 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 21:49:12 2026 +0800"
      },
      "message": "[python] supports random access of blob files (#8033)"
    },
    {
      "commit": "22f3fd344cb86bb39d8149b764109cdb5c24a70c",
      "tree": "f1dfb01f97f0ab466436f57b915d2508c1aed3d6",
      "parents": [
        "95934286ca0be7e48f4eb07295d5fe64e9f0b6a6"
      ],
      "author": {
        "name": "Zouxxyy",
        "email": "zouxinyu.zxy@alibaba-inc.com",
        "time": "Fri May 29 21:45:20 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 21:45:20 2026 +0800"
      },
      "message": "[spark] Fix schema merge creating duplicate columns with case-mismatched names (#8034)\n\nWhen `merge-schema` is enabled and source column names differ only in\ncase from target columns (e.g. source `ID` vs target `id`),\n`SchemaMergingUtils` treats them as new columns due to case-sensitive\n`HashMap` lookups. This causes duplicate columns in the schema and makes\nthe table unreadable (`Field names must be unique`).\n\nThis PR adds a `caseSensitive` parameter through the schema merge chain\n(`SchemaMergingUtils` → `SchemaManager` → `FileStore` → Spark\n`SchemaHelper`), using `TreeMap(String.CASE_INSENSITIVE_ORDER)` for\nfield matching when `caseSensitive\u003dfalse`. Spark callers pass\n`spark.sql.caseSensitive` config (default `false`).\n\nAffects both `INSERT ... merge-schema\u003dtrue` and `MERGE INTO ...\nmerge-schema\u003dtrue` paths."
    },
    {
      "commit": "95934286ca0be7e48f4eb07295d5fe64e9f0b6a6",
      "tree": "af586b360beeecb5f9c3ab5aab4c041b7bbc228b",
      "parents": [
        "5afb5b66784fab6dcf26632f78934d27b0d1a9a3"
      ],
      "author": {
        "name": "Faiz",
        "email": "wxy407679@antgroup.com",
        "time": "Fri May 29 21:25:03 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 21:25:03 2026 +0800"
      },
      "message": "[python] optimize blob format reader by inline blob data (#8030)\n\nI\u0027m testing random access of python\u0027s blob. Reading 100 0.5MB blobs cost\n30s even blob-as-descriptor \u003d true. This is because each blob will open\nthe input stream once."
    },
    {
      "commit": "5afb5b66784fab6dcf26632f78934d27b0d1a9a3",
      "tree": "17d04c67f27c5aaa62e52f79aa380869d43b2af0",
      "parents": [
        "2471b0a5b0f8a6ec06dc2bdd89ba9389b7aa2142"
      ],
      "author": {
        "name": "Jingsong Lee",
        "email": "jingsonglee0@gmail.com",
        "time": "Fri May 29 21:22:20 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 21:22:20 2026 +0800"
      },
      "message": "[core][python] Extend comment directives for BLOB/VECTOR column creation (#8035)\n\nSupport __BLOB_VIEW_FIELD, __BLOB_EXTERNAL_STORAGE_FIELD, and\n__VECTOR_FIELD;dim comment directives in both CREATE TABLE and ALTER\nTABLE ADD COLUMN. Rename BlobSchemaUtils to ColumnDirectiveUtils and\nconsolidate all directive logic (parsing, type conversion, option\nmodification, drop cleanup) into it so SchemaManager stays clean with\nsingle-method calls.\n\nKey changes:\n- ColumnDirectiveUtils: applyAddColumnDirective (one-stop for ADD\nCOLUMN), applyDirectives (Schema-level for CREATE TABLE),\nremoveDroppedDirectiveOptions (clean options on DROP COLUMN by\nBLOB/VECTOR type)\n- SchemaManager: uses ColumnDirectiveUtils for create_table, addColumn,\ndropColumn\n- Python: mirrors Java with column_directive_utils.py and updated\nschema_manager.py\n- Docs: all SQL examples use COMMENT directive as primary approach with\nFlink SQL / Spark SQL / Java API / Python API tabs; removed\nblob-field/vector-field option references from docs"
    },
    {
      "commit": "2471b0a5b0f8a6ec06dc2bdd89ba9389b7aa2142",
      "tree": "8594a170650ff38f42cd5f750d8d6c3e57b1bfa2",
      "parents": [
        "46b120da2918ea51d0f4c51768fed1184f4356d2"
      ],
      "author": {
        "name": "Jingsong Lee",
        "email": "jingsonglee0@gmail.com",
        "time": "Fri May 29 15:32:20 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 15:32:20 2026 +0800"
      },
      "message": "[global-index] Refactor index read path to async with CompletableFuture (#8003)"
    },
    {
      "commit": "46b120da2918ea51d0f4c51768fed1184f4356d2",
      "tree": "43d550ddcd841ff579da83336020c5937e1e3eb0",
      "parents": [
        "a9ce9b8b64beb280baf981825a2a463b226647a9"
      ],
      "author": {
        "name": "Faiz",
        "email": "wxy407679@antgroup.com",
        "time": "Fri May 29 13:25:49 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 13:25:49 2026 +0800"
      },
      "message": "[core] support adding blob column through comments (#7996)"
    },
    {
      "commit": "a9ce9b8b64beb280baf981825a2a463b226647a9",
      "tree": "e2cf58783ff1f6f9e20c6b6c302b62dd5cce947e",
      "parents": [
        "a2671e9c1ad7abb33ca51541060af24d0a9ea36e"
      ],
      "author": {
        "name": "Junrui Lee",
        "email": "jrlee.ljr@gmail.com",
        "time": "Fri May 29 13:23:43 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 13:23:43 2026 +0800"
      },
      "message": "[spark] Support JSON format in COPY INTO (#7993)\n\n- Add JSON format support for `COPY INTO` import and export, alongside\nexisting CSV support\n- JSON uses column-name matching (not positional), with options for\n`MULTI_LINE`, `NULL_IF`, `EMPTY_FIELD_AS_NULL`, and `COMPRESSION`\n- CSV-only options (e.g. `FIELD_DELIMITER`, `SKIP_HEADER`) are rejected\nfor JSON format with clear error messages"
    },
    {
      "commit": "a2671e9c1ad7abb33ca51541060af24d0a9ea36e",
      "tree": "da828604edefb5d0d9558b0871ae6a3c2a56566d",
      "parents": [
        "59722abe5fe49f12f6395f3467e38f1682618a90"
      ],
      "author": {
        "name": "Jingsong Lee",
        "email": "jingsonglee0@gmail.com",
        "time": "Fri May 29 11:42:21 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 11:42:21 2026 +0800"
      },
      "message": "[python] Rename DataBlobWriter to DedicatedFormatWriter and support blob+vector splitting (#8027)\n\nDataBlobWriter now handles normal + blob + vector columns (matching\nJava\u0027s DedicatedFormatRollingFileWriter), so rename it to\nDedicatedFormatWriter. Also add data evolution format tests covering\nparquet, blob, and vector paths."
    }
  ],
  "next": "59722abe5fe49f12f6395f3467e38f1682618a90"
}