)]}'
{
  "commit": "d0edf1a726f0c7eb466aed8e5c85bb5350d5482a",
  "tree": "5047f5827186d746d98bc2f30186ca774440f376",
  "parents": [
    "e2987ccfc9c01ab701122edd83a8db85a839ad37"
  ],
  "author": {
    "name": "Yicong Huang",
    "email": "17627829+Yicong-Huang@users.noreply.github.com",
    "time": "Fri Apr 24 14:30:17 2026 +0800"
  },
  "committer": {
    "name": "Ruifeng Zheng",
    "email": "ruifengz@foxmail.com",
    "time": "Fri Apr 24 14:30:17 2026 +0800"
  },
  "message": "[SPARK-55726][PYTHON][TEST][FOLLOW-UP] Write grouped benchmark data as one Arrow IPC stream per DataFrame\n\n### What changes were proposed in this pull request?\n\nFix `MockProtocolWriter.write_grouped_data_payload` in `python/benchmarks/bench_eval_type.py` to write each DataFrame as one Arrow IPC stream (multiple batches per stream), matching the real worker wire protocol. `num_dfs` is now inferred from the group tuple length. Grouped/cogrouped data factories return nested-list shapes accordingly.\n\n### Why are the changes needed?\n\nThe old writer emitted one stream per RecordBatch, while declaring ``num_dfs\u003d1`` upfront. When a group spanned more than one batch, the worker read the next stream\u0027s bytes as ``num_dfs`` for the next group. The ``lg_grp_few_col`` / ``lg_grp_many_col`` scenarios in ``GroupedMapPandasUDF{Time,Peakmem}Bench`` (100K rows/group with default ``MAX_RECORDS_PER_BATCH\u003d10_000``) hit this:\n\n```\npyspark.errors.exceptions.base.PySparkValueError:\n  [INVALID_NUMBER_OF_DATAFRAMES_IN_GROUP] Invalid number of dataframes in group 1208025088.\n```\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nRan all 15 `_GroupedMapPandasBenchMixin` scenario x UDF combinations and one scenario per other grouped/cogrouped bench class locally. All pass. `lg_grp_*` fail on master before this patch and pass after.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #55527 from Yicong-Huang/SPARK-55726-followup.\n\nAuthored-by: Yicong Huang \u003c17627829+Yicong-Huang@users.noreply.github.com\u003e\nSigned-off-by: Ruifeng Zheng \u003cruifengz@foxmail.com\u003e\n",
  "tree_diff": [
    {
      "type": "modify",
      "old_id": "b828c0e2811cc3cbb77773edd09ca83910de66e0",
      "old_mode": 33188,
      "old_path": "python/benchmarks/bench_eval_type.py",
      "new_id": "6dd352cf5d7e2cf16d66d23bba29a2969ca0dd68",
      "new_mode": 33188,
      "new_path": "python/benchmarks/bench_eval_type.py"
    }
  ]
}
