1. 88c146e chore(ci): Clean up env variable leak in TestSqlConf (#18486) by Geser Dugarov · 17 hours ago master
  2. c83ae87 fix(flink): Handle bootstrap write metadata correctly after job resca… (#18485) by Shuo Cheng · 20 hours ago
  3. 8f153b2 perf(core): optimize rollback listing calls on metadata table (#18279) by Balajee Nagasubramaniam · 25 hours ago
  4. 5d61a35 refactor(flink): Refactor Flink compaction/clean pipeline with composite table service handlers (#18477) by Shuo Cheng · 26 hours ago
  5. eaa9c8b fix: fix BufferedReader resource leak in InputStreamConsumer (#18469) by mailtoboggavarapu-coder · 31 hours ago
  6. 6310d70 fix: avoid duplicate archived timeline instants from leftover merge files (#18408) by Surya Prasanna · 34 hours ago
  7. f063aa5 fix(hfile): use Hadoop WritableUtils VarInt encoding in HFile block index writer (#18465) by Asish Kumar · 2 days ago
  8. 65f996a feat(lance): throwing exception/guard for users trying to read Lance from non-spark engines (#18481) by Vova Kolmakov · 2 days ago
  9. 0591933 fix: remove unused code (#18473) by yuqi · 2 days ago
  10. bc7a766 feat: use ScanOperation for Spark 3.3 and 3.4 partition pruning (#17936) by Surya Prasanna · 3 days ago
  11. 1582d60 fix(flink): Reject deferred RLI initialization for flink writer (#18399) by Shuo Cheng · 3 days ago
  12. f2dea52 feat(schema): Add support to write shredded variants (#18036) by voonhous · 3 days ago
  13. fd20018 feat(vector): Add Spark VECTOR Search TVF with intial KNN algorithm (#18432) by Rahil C · 3 days ago
  14. 3c53b91 perf(metadata): avoid recursive calls for partition listing using catalog (#18265) by Surya Prasanna · 4 days ago
  15. d60be14 fix: do not shutdown distruptor thread in snapshotState in flink connnector (#18446) by Shihuan Liu · 4 days ago
  16. 0baabba feat(common): Add API to fetch log files created on or before given instant time (#18142) by Nada · 6 days ago
  17. 57c3f63 refactor: consolidate common utility classes for Flink CDC read (#18436) by Peter Huang · 7 days ago
  18. d69826c fix(flink): Trigger a failover after pending instants recommitted to b… (#18434) by Shuo Cheng · 7 days ago
  19. 3e662f9 feat (flink): improvement bucket assignment for MOR with bucket index (#18444) by Peter Huang · 8 days ago
  20. 76b4fa4 feat(table-services): Support hoodie.clustering.enable.expirations to allow cleanup of failed clustering plans (intended for PreferWriterConflictResolutionStrategy) (#18302) by Krishen · 8 days ago
  21. 149d84f feat(metrics): Add table-specific metrics registry support for multi-tenant scenarios (#18179) by Prashant Wason · 8 days ago
  22. c9d0ffb fix(common): close parquet reader iterator on EOF (#18407) by Surya Prasanna · 8 days ago
  23. 9af7f29 perf(table-services): Only attempt scheduling log compaction if number of deltacommits is at least LogCompactionBlocksThreshold (#18306) by Krishen · 9 days ago
  24. 180592a feat(spark): implement column pruning for incremental queries (#17514) by Surya Prasanna · 9 days ago
  25. 447af5a fix(spark): Ignore duplicate fields when merging schema in IncrementalRelation (#17776) by Prashant Wason · 9 days ago
  26. 3e1d300 feat(vector): Add guard for user creating nested VECTOR (#18431) by Rahil C · 9 days ago
  27. ea59d60 fix(spark): validate and normalize incremental start/end instants (#18426) by yaojiejia · 9 days ago
  28. 6079b6a feat: support limit push down in Hudi Flink Source V2 (#18406) by Peter Huang · 9 days ago
  29. 35e2bbf refactor: add import (#18428) by voonhous · 10 days ago
  30. f6a33fd feat(lance): Implement canWrite() in HoodieSparkLanceWriter with configurable max file size for Lance (#18341) by Vova Kolmakov · 10 days ago
  31. bef0c54a [HUDI-3055] Fix hardcoded GZIP compression codec in HFileUtils (#18263) by ZZZxDong · 10 days ago
  32. dd8fe99 refactor: remove HoodieWriteConfig.getOrcCompressionCodec() function (#18422) by Shihuan Liu · 10 days ago
  33. 75918ea feat(blob): Create blobs in Spark SQL (#18347) by Tim Brown · 10 days ago
  34. 2b56eae feat(utilities): add option to make all schema columns nullable for backwards compatibility (#17777) by Prashant Wason · 10 days ago
  35. e270e25 perf: Improve Serialization Performance of BufferedRecord (#18418) by Shuo Cheng · 10 days ago
  36. 1ff0506 fix: Use target schema for non-FileBased/SchemaRegistry providers in SourceFormatAdapter (#17946) by Surya Prasanna · 10 days ago
  37. 236dc22 test(lance): Add test of bloomFilter support to TestLanceDataSource (#18388) by Vova Kolmakov · 11 days ago
  38. 54276a9 refactor: modularize long test methods in TestHoodieClientOnCopyOnWriteStorage (#18377) by yaojiejia · 11 days ago
  39. e4bc985 Explicitly state the spark stage name (#18416) by Surya Prasanna · 11 days ago
  40. e930b83 feat: Add Unshredded Variant read & write support (#17833) by voonhous · 11 days ago
  41. d241b09 fix(flink): Improve splits distribution strategy for mor table w/ bucket index (#18103) by Joy · 11 days ago
  42. 02e5efb fix: Fixed the issue of incorrect opName values in Flink bulk insert writing (#18313) by empcl · 12 days ago
  43. 1eb97b3 [HUDI-7030] Commit-based Clustering Plan Strategy (#18251) by Prashant Wason · 13 days ago
  44. bb5abb6 fix: Optimizing internal schema lookup in TableSchemaResolver (#18387) by Sivabalan Narayanan · 14 days ago
  45. 3fc1deb feat(vector): Support writing VECTOR to parquet and avro formats using Spark (#18328) by Rahil C · 14 days ago
  46. f15e1d0 feat: add Flink source reader function for cdc splits (#18361) by Peter Huang · 2 weeks ago
  47. 56bc283 feat(flink): add pre-commit validation framework for Flink - Phase 2 (#18362) by Xinli Shang · 2 weeks ago
  48. 69fa35b feat(metadata): Defer RLI initialization for fresh tables to optimize file group allocation (#18353) by Sivabalan Narayanan · 2 weeks ago
  49. 9859f9a feat(flink): Support more efficient customized serializer for HoodieRecordGlobalLocation (#18326) by Shuo Cheng · 2 weeks ago
  50. 2f07364 feat: add graceful handling for post-commit failures with metrics (#18196) by Surya Prasanna · 2 weeks ago
  51. 4133739 perf: Skip unnecessary clean planning for MOR metadata table file-version cleaning (#17943) by Surya Prasanna · 2 weeks ago
  52. c2b401e feat(flink): Support bootstrap from RLI to local RocksDB for flink bucket assigner (#18254) by Shuo Cheng · 2 weeks ago
  53. b60855d feat: support read splits limit in Hudi Flink Source V2 (#18370) by Peter Huang · 2 weeks ago
  54. 817b3ad fix(common): fix typos commited -> committed, commiting -> committing (#18363) by Xinli Shang · 3 weeks ago
  55. 331b018 feat(hive-sync): add Spark-catalog based metastore client implementation to avoid Hive-on-Spark classloader issues (#18203) by Surya Prasanna · 3 weeks ago
  56. 81a8c26 feat: support read commits limit in Hudi Flink Source V2 (#18369) by Peter Huang · 3 weeks ago
  57. 941ae62 fix: modify the incorrect Hive configuration in hoodie hive catalog (#18365) by YangXiao · 3 weeks ago
  58. f74bf3a fix(flink): enable batch read it for flink source v2 (#18325) by Peter Huang · 3 weeks ago
  59. 3aef2ca fix: Fix flaky test TestProtoConversionUtil#allFieldsSet_wellKnownTypesAndTimestampsAsRecords (#18352) by Shuo Cheng · 3 weeks ago
  60. 967e456 feat(common): add core pre-commit validation framework - Phase 1 (#18068) by Xinli Shang · 3 weeks ago
  61. 26b324f fix(concurrency): detect rollback conflicts with ongoing commit operations (#18089) by Prashant Wason · 3 weeks ago
  62. b634262 feat(metadata-table): add config to disable automatic deletion of MDT partitions (#18181) by Prashant Wason · 3 weeks ago
  63. b0e40f6 feat(utilities): add DELETE operation support for HudiStreamer (#18088) by Prashant Wason · 3 weeks ago
  64. 4e21ff1 docs: Update the build instructions by mentioning profiles in README (#18310) by Ranga Reddy · 3 weeks ago
  65. 19c4cc9 fix: Use explicit Throwable type in AvroConversionUtils catch clause (#18342) by Y Ethan Guo · 3 weeks ago
  66. cad08b1 feat(lance): Support bloom filter in Lance writer and reader (#18304) by Vova Kolmakov · 3 weeks ago
  67. bbda242 feat(flink): Support write buffer based on flink managed memory (#18319) by Shuo Cheng · 3 weeks ago
  68. 3a1ea4b feat(metasync): Support HMS 4.x in JDBC sync mode via automatic Thrift fallback (#18227) by Balaji Varadarajan · 3 weeks ago
  69. 14a549f chore(ci): Add codecov coverage from tests running on Spark 4.0 (#18335) by Y Ethan Guo · 3 weeks ago
  70. b7b0b83 chore(ci): Simplify test combinations on Spark in Github actions (#18336) by Y Ethan Guo · 3 weeks ago
  71. a179555 chore(ci): Add test jobs and Codecov integration in GitHub Actions (#18225) by Y Ethan Guo · 3 weeks ago
  72. a16d431 feat(hudi-sync): Publish HUDI version to Hive metastore (#18307) by Krishen · 4 weeks ago
  73. f763da2 feat(conflict-resolution): Allow PreferWriterConflictResolutionStrategy to abort clustering if there is an ongoing write that is in requested state. (#18280) by Krishen · 4 weeks ago
  74. 74649c8 docs: RFC-102 - Spark Vector Search in Apache Hudi (#14218) by Rahil C · 4 weeks ago
  75. f64c93e fix(clustering): When inferring wether an instant is clustering, do not fail if replacecommit was rolled back already (by a concurrent writer) (#18288) by Krishen · 4 weeks ago
  76. 39f1f39 fix(common): Filter stray files when loading partitions in AbstractTableFileSystemView (#18047) by Prashant Wason · 4 weeks ago
  77. e6723a8 fix(metadata): Allow metadata table bootstrap when pending commits are being rolled back (#18033) by Prashant Wason · 4 weeks ago
  78. 39cb726 feat(common): Add Policy for cleanup/rollback before each write (#18197) by Krishen · 4 weeks ago
  79. ddfcc92 [HUDI-7503] Compaction execution should fail if another active writer is already executing the same plan (#18012) by Krishen · 4 weeks ago
  80. b77c7e5 fix(common): Handle zero byte properties file and ensure atomic writes during modification (#18058) by Prashant Wason · 4 weeks ago
  81. b31d5f7 refactor: rewrite executors tests to avoid code duplication (#18005) by yaojiejia · 4 weeks ago
  82. b01ae22 fix: sort partitions after filtering for clustering planning (#18092) by Prashant Wason · 4 weeks ago
  83. da244e1 feat(flink): Support create table DDL without primary key (#18086) by Prashant Wason · 4 weeks ago
  84. abb5fd2 feat: add support for touch partitions in HiveSyncTool (#18064) by Nada · 4 weeks ago
  85. cc3a529 fix(table-services): When single clustering group config is disabled, clustering should not create clustering groups with same number of input/output files (#18172) by Krishen · 4 weeks ago
  86. f867059 fix(spark): SparkSQL write queries should correctly infer HUDI configs from spark.hoodie.* configs in spark conf (#18297) by Krishen · 4 weeks ago
  87. b5daa30 feat(flink): Add helper functions to parse Kafka offset differences b… (#18125) by Xinli Shang · 4 weeks ago
  88. d13310c perf(table-services): Incremental clean planning (for COW) should ignore partitions from instants with only new file groups (#18016) by Krishen · 4 weeks ago
  89. 93b8e9f feat(flink): Add Kafka offset tracking to Flink Hudi commits (#18127) by Xinli Shang · 5 weeks ago
  90. 22aa1fa fix: Databricks Spark 3.4 Runtime compatibility for reading Hudi tables (#18292) by Y Ethan Guo · 5 weeks ago
  91. 8296df0 fix(flink): enable integration test for Hudi Flink Source V2 (#18287) by Peter Huang · 5 weeks ago
  92. 2c1cb39 feat(vector): add converters from spark to hoodieSchema for vectors (#18190) by Rahil C · 5 weeks ago
  93. 729b30c fix: Improve config docs of enabling column stats in metadata table (#18289) by Y Ethan Guo · 5 weeks ago
  94. bf4425b fix: Remove noisy logging when table partition is empty (#18290) by Y Ethan Guo · 5 weeks ago
  95. 6e0d786 fix(flink): Don't perform table service during mdt initialization if streaming write is enabled (#18283) by Shuo Cheng · 5 weeks ago
  96. 4499b0b feat(spark): ZooKeeper node should hold spark app id (for helping debug when lock is held for long time) (#18123) by Krishen · 5 weeks ago
  97. 9ba2760 feat(table-services): Support clustering file groups with earlier instants times first (#18174) by Krishen · 5 weeks ago
  98. 31b8706 feat(vector): Add further research for supporting VECTOR type to RFC-99 (#18184) by Rahil C · 5 weeks ago
  99. 3dcf4c6 feat(client): Add pre-write validator framework (#18239) by Nada · 5 weeks ago
  100. 3139a19 feat(table-services): Allow users to not parallelize each partition with engine context during clustering planning (#18191) by Krishen · 5 weeks ago