)]}'
{
  "commit": "fd3a3134bdadff6db246784b4beb039bda70a200",
  "tree": "4dcc13dc2bae6c6056f3a7715383e0408df260f4",
  "parents": [
    "7d63a11f9e4873bfd109d850bfda4b15d64275c6"
  ],
  "author": {
    "name": "Jimmy",
    "email": "lianyukang@selectdb.com",
    "time": "Thu Apr 09 15:38:44 2026 +0800"
  },
  "committer": {
    "name": "GitHub",
    "email": "noreply@github.com",
    "time": "Thu Apr 09 15:38:44 2026 +0800"
  },
  "message": "[improve](compaction) Use segment footer raw_data_bytes for first-time batch size estimation (#62271)\n\n## Summary\n\n- When vertical compaction runs for the first time on a tablet (no\nhistorical sampling data), `estimate_batch_size()` previously returned a\nhardcoded value of 992, which could cause OOM for wide tables or be too\nconservative for narrow tables\n- This change uses `ColumnMetaPB.raw_data_bytes` from segment footer to\ncompute a per-row size estimate for the first compaction.\n`raw_data_bytes` records the original data size before encoding, which\nclosely approximates runtime `Block::bytes()`\n- Historical sampling now uses `Block::allocated_bytes()` instead of\n`bytes()` for more accurate memory estimation (`size()` vs `capacity()`)\n- Subsequent compactions with historical sampling data are completely\nunchanged\n\n### Key design decisions\n\n| Column type | Estimation strategy |\n|------------|-------------------|\n| Scalar (INT/VARCHAR etc.) | `raw_data_bytes / rows_with_data` +\nstructural compensation (+1 null map, +8 offset) |\n| Complex (ARRAY/MAP/STRUCT) | `raw_data_bytes / rows_with_data`, no\ncompensation (already includes recursive sub-writer data) |\n| VARIANT (root/subcolumn) | Fallback to 992 (`raw_data_bytes\u003d0 // TODO`\nin writer) |\n\n### Performance safeguards\n\n- Footer collection only runs on first compaction (no historical\nsampling data)\n- Skipped entirely when `compaction_batch_size` is manually set\n- OOM backoff and sparse optimization paths are untouched\n\n## Test plan\n\n- [ ] Wide table (200+ columns) first compaction does not OOM\n- [ ] Narrow table first compaction batch_size is close to upper limit\n- [ ] Multi-round compaction: first round uses footer, subsequent rounds\nuse historical sampling\n- [ ] Variant columns fallback to 992\n- [ ] Sparse optimization is not affected\n- [ ] `TestFirstCompactionUsesFooterEstimation` unit test passes\n- [ ] `TestFooterRawDataBytesAccuracy` unit test passes",
  "tree_diff": [
    {
      "type": "modify",
      "old_id": "e748ae460c7e0e4eef9a74331cb34b4e75d212d8",
      "old_mode": 33188,
      "old_path": "be/src/storage/iterator/vertical_merge_iterator.h",
      "new_id": "865399c47478c94656b03c2f7a5cf42dea78183d",
      "new_mode": 33188,
      "new_path": "be/src/storage/iterator/vertical_merge_iterator.h"
    },
    {
      "type": "modify",
      "old_id": "ec246f13c16d5d7dbfc49c71fd4431d234fcf513",
      "old_mode": 33188,
      "old_path": "be/src/storage/merger.cpp",
      "new_id": "dbd03666f7992b2d4d0880422498b8ea15b07457",
      "new_mode": 33188,
      "new_path": "be/src/storage/merger.cpp"
    },
    {
      "type": "modify",
      "old_id": "57b273caa592ecb372b07d972ed975f690fd9ba1",
      "old_mode": 33188,
      "old_path": "be/test/storage/compaction/vertical_compaction_test.cpp",
      "new_id": "c20f14e3685690cca52f27313a2b12a68f4138f6",
      "new_mode": 33188,
      "new_path": "be/test/storage/compaction/vertical_compaction_test.cpp"
    }
  ]
}
