Sign in
apache
/
hudi
/
HEAD
88c146e
chore(ci): Clean up env variable leak in TestSqlConf (#18486)
by Geser Dugarov
· 17 hours ago
master
c83ae87
fix(flink): Handle bootstrap write metadata correctly after job resca… (#18485)
by Shuo Cheng
· 20 hours ago
8f153b2
perf(core): optimize rollback listing calls on metadata table (#18279)
by Balajee Nagasubramaniam
· 25 hours ago
5d61a35
refactor(flink): Refactor Flink compaction/clean pipeline with composite table service handlers (#18477)
by Shuo Cheng
· 26 hours ago
eaa9c8b
fix: fix BufferedReader resource leak in InputStreamConsumer (#18469)
by mailtoboggavarapu-coder
· 31 hours ago
6310d70
fix: avoid duplicate archived timeline instants from leftover merge files (#18408)
by Surya Prasanna
· 34 hours ago
f063aa5
fix(hfile): use Hadoop WritableUtils VarInt encoding in HFile block index writer (#18465)
by Asish Kumar
· 2 days ago
65f996a
feat(lance): throwing exception/guard for users trying to read Lance from non-spark engines (#18481)
by Vova Kolmakov
· 2 days ago
0591933
fix: remove unused code (#18473)
by yuqi
· 2 days ago
bc7a766
feat: use ScanOperation for Spark 3.3 and 3.4 partition pruning (#17936)
by Surya Prasanna
· 3 days ago
1582d60
fix(flink): Reject deferred RLI initialization for flink writer (#18399)
by Shuo Cheng
· 3 days ago
f2dea52
feat(schema): Add support to write shredded variants (#18036)
by voonhous
· 3 days ago
fd20018
feat(vector): Add Spark VECTOR Search TVF with intial KNN algorithm (#18432)
by Rahil C
· 3 days ago
3c53b91
perf(metadata): avoid recursive calls for partition listing using catalog (#18265)
by Surya Prasanna
· 4 days ago
d60be14
fix: do not shutdown distruptor thread in snapshotState in flink connnector (#18446)
by Shihuan Liu
· 4 days ago
0baabba
feat(common): Add API to fetch log files created on or before given instant time (#18142)
by Nada
· 6 days ago
57c3f63
refactor: consolidate common utility classes for Flink CDC read (#18436)
by Peter Huang
· 7 days ago
d69826c
fix(flink): Trigger a failover after pending instants recommitted to b… (#18434)
by Shuo Cheng
· 7 days ago
3e662f9
feat (flink): improvement bucket assignment for MOR with bucket index (#18444)
by Peter Huang
· 8 days ago
76b4fa4
feat(table-services): Support hoodie.clustering.enable.expirations to allow cleanup of failed clustering plans (intended for PreferWriterConflictResolutionStrategy) (#18302)
by Krishen
· 8 days ago
149d84f
feat(metrics): Add table-specific metrics registry support for multi-tenant scenarios (#18179)
by Prashant Wason
· 8 days ago
c9d0ffb
fix(common): close parquet reader iterator on EOF (#18407)
by Surya Prasanna
· 8 days ago
9af7f29
perf(table-services): Only attempt scheduling log compaction if number of deltacommits is at least LogCompactionBlocksThreshold (#18306)
by Krishen
· 9 days ago
180592a
feat(spark): implement column pruning for incremental queries (#17514)
by Surya Prasanna
· 9 days ago
447af5a
fix(spark): Ignore duplicate fields when merging schema in IncrementalRelation (#17776)
by Prashant Wason
· 9 days ago
3e1d300
feat(vector): Add guard for user creating nested VECTOR (#18431)
by Rahil C
· 9 days ago
ea59d60
fix(spark): validate and normalize incremental start/end instants (#18426)
by yaojiejia
· 9 days ago
6079b6a
feat: support limit push down in Hudi Flink Source V2 (#18406)
by Peter Huang
· 9 days ago
35e2bbf
refactor: add import (#18428)
by voonhous
· 10 days ago
f6a33fd
feat(lance): Implement canWrite() in HoodieSparkLanceWriter with configurable max file size for Lance (#18341)
by Vova Kolmakov
· 10 days ago
bef0c54a
[HUDI-3055] Fix hardcoded GZIP compression codec in HFileUtils (#18263)
by ZZZxDong
· 10 days ago
dd8fe99
refactor: remove HoodieWriteConfig.getOrcCompressionCodec() function (#18422)
by Shihuan Liu
· 10 days ago
75918ea
feat(blob): Create blobs in Spark SQL (#18347)
by Tim Brown
· 10 days ago
2b56eae
feat(utilities): add option to make all schema columns nullable for backwards compatibility (#17777)
by Prashant Wason
· 10 days ago
e270e25
perf: Improve Serialization Performance of BufferedRecord (#18418)
by Shuo Cheng
· 10 days ago
1ff0506
fix: Use target schema for non-FileBased/SchemaRegistry providers in SourceFormatAdapter (#17946)
by Surya Prasanna
· 10 days ago
236dc22
test(lance): Add test of bloomFilter support to TestLanceDataSource (#18388)
by Vova Kolmakov
· 11 days ago
54276a9
refactor: modularize long test methods in TestHoodieClientOnCopyOnWriteStorage (#18377)
by yaojiejia
· 11 days ago
e4bc985
Explicitly state the spark stage name (#18416)
by Surya Prasanna
· 11 days ago
e930b83
feat: Add Unshredded Variant read & write support (#17833)
by voonhous
· 11 days ago
d241b09
fix(flink): Improve splits distribution strategy for mor table w/ bucket index (#18103)
by Joy
· 11 days ago
02e5efb
fix: Fixed the issue of incorrect opName values in Flink bulk insert writing (#18313)
by empcl
· 12 days ago
1eb97b3
[HUDI-7030] Commit-based Clustering Plan Strategy (#18251)
by Prashant Wason
· 13 days ago
bb5abb6
fix: Optimizing internal schema lookup in TableSchemaResolver (#18387)
by Sivabalan Narayanan
· 14 days ago
3fc1deb
feat(vector): Support writing VECTOR to parquet and avro formats using Spark (#18328)
by Rahil C
· 14 days ago
f15e1d0
feat: add Flink source reader function for cdc splits (#18361)
by Peter Huang
· 2 weeks ago
56bc283
feat(flink): add pre-commit validation framework for Flink - Phase 2 (#18362)
by Xinli Shang
· 2 weeks ago
69fa35b
feat(metadata): Defer RLI initialization for fresh tables to optimize file group allocation (#18353)
by Sivabalan Narayanan
· 2 weeks ago
9859f9a
feat(flink): Support more efficient customized serializer for HoodieRecordGlobalLocation (#18326)
by Shuo Cheng
· 2 weeks ago
2f07364
feat: add graceful handling for post-commit failures with metrics (#18196)
by Surya Prasanna
· 2 weeks ago
4133739
perf: Skip unnecessary clean planning for MOR metadata table file-version cleaning (#17943)
by Surya Prasanna
· 2 weeks ago
c2b401e
feat(flink): Support bootstrap from RLI to local RocksDB for flink bucket assigner (#18254)
by Shuo Cheng
· 2 weeks ago
b60855d
feat: support read splits limit in Hudi Flink Source V2 (#18370)
by Peter Huang
· 2 weeks ago
817b3ad
fix(common): fix typos commited -> committed, commiting -> committing (#18363)
by Xinli Shang
· 3 weeks ago
331b018
feat(hive-sync): add Spark-catalog based metastore client implementation to avoid Hive-on-Spark classloader issues (#18203)
by Surya Prasanna
· 3 weeks ago
81a8c26
feat: support read commits limit in Hudi Flink Source V2 (#18369)
by Peter Huang
· 3 weeks ago
941ae62
fix: modify the incorrect Hive configuration in hoodie hive catalog (#18365)
by YangXiao
· 3 weeks ago
f74bf3a
fix(flink): enable batch read it for flink source v2 (#18325)
by Peter Huang
· 3 weeks ago
3aef2ca
fix: Fix flaky test TestProtoConversionUtil#allFieldsSet_wellKnownTypesAndTimestampsAsRecords (#18352)
by Shuo Cheng
· 3 weeks ago
967e456
feat(common): add core pre-commit validation framework - Phase 1 (#18068)
by Xinli Shang
· 3 weeks ago
26b324f
fix(concurrency): detect rollback conflicts with ongoing commit operations (#18089)
by Prashant Wason
· 3 weeks ago
b634262
feat(metadata-table): add config to disable automatic deletion of MDT partitions (#18181)
by Prashant Wason
· 3 weeks ago
b0e40f6
feat(utilities): add DELETE operation support for HudiStreamer (#18088)
by Prashant Wason
· 3 weeks ago
4e21ff1
docs: Update the build instructions by mentioning profiles in README (#18310)
by Ranga Reddy
· 3 weeks ago
19c4cc9
fix: Use explicit Throwable type in AvroConversionUtils catch clause (#18342)
by Y Ethan Guo
· 3 weeks ago
cad08b1
feat(lance): Support bloom filter in Lance writer and reader (#18304)
by Vova Kolmakov
· 3 weeks ago
bbda242
feat(flink): Support write buffer based on flink managed memory (#18319)
by Shuo Cheng
· 3 weeks ago
3a1ea4b
feat(metasync): Support HMS 4.x in JDBC sync mode via automatic Thrift fallback (#18227)
by Balaji Varadarajan
· 3 weeks ago
14a549f
chore(ci): Add codecov coverage from tests running on Spark 4.0 (#18335)
by Y Ethan Guo
· 3 weeks ago
b7b0b83
chore(ci): Simplify test combinations on Spark in Github actions (#18336)
by Y Ethan Guo
· 3 weeks ago
a179555
chore(ci): Add test jobs and Codecov integration in GitHub Actions (#18225)
by Y Ethan Guo
· 3 weeks ago
a16d431
feat(hudi-sync): Publish HUDI version to Hive metastore (#18307)
by Krishen
· 4 weeks ago
f763da2
feat(conflict-resolution): Allow PreferWriterConflictResolutionStrategy to abort clustering if there is an ongoing write that is in requested state. (#18280)
by Krishen
· 4 weeks ago
74649c8
docs: RFC-102 - Spark Vector Search in Apache Hudi (#14218)
by Rahil C
· 4 weeks ago
f64c93e
fix(clustering): When inferring wether an instant is clustering, do not fail if replacecommit was rolled back already (by a concurrent writer) (#18288)
by Krishen
· 4 weeks ago
39f1f39
fix(common): Filter stray files when loading partitions in AbstractTableFileSystemView (#18047)
by Prashant Wason
· 4 weeks ago
e6723a8
fix(metadata): Allow metadata table bootstrap when pending commits are being rolled back (#18033)
by Prashant Wason
· 4 weeks ago
39cb726
feat(common): Add Policy for cleanup/rollback before each write (#18197)
by Krishen
· 4 weeks ago
ddfcc92
[HUDI-7503] Compaction execution should fail if another active writer is already executing the same plan (#18012)
by Krishen
· 4 weeks ago
b77c7e5
fix(common): Handle zero byte properties file and ensure atomic writes during modification (#18058)
by Prashant Wason
· 4 weeks ago
b31d5f7
refactor: rewrite executors tests to avoid code duplication (#18005)
by yaojiejia
· 4 weeks ago
b01ae22
fix: sort partitions after filtering for clustering planning (#18092)
by Prashant Wason
· 4 weeks ago
da244e1
feat(flink): Support create table DDL without primary key (#18086)
by Prashant Wason
· 4 weeks ago
abb5fd2
feat: add support for touch partitions in HiveSyncTool (#18064)
by Nada
· 4 weeks ago
cc3a529
fix(table-services): When single clustering group config is disabled, clustering should not create clustering groups with same number of input/output files (#18172)
by Krishen
· 4 weeks ago
f867059
fix(spark): SparkSQL write queries should correctly infer HUDI configs from spark.hoodie.* configs in spark conf (#18297)
by Krishen
· 4 weeks ago
b5daa30
feat(flink): Add helper functions to parse Kafka offset differences b… (#18125)
by Xinli Shang
· 4 weeks ago
d13310c
perf(table-services): Incremental clean planning (for COW) should ignore partitions from instants with only new file groups (#18016)
by Krishen
· 4 weeks ago
93b8e9f
feat(flink): Add Kafka offset tracking to Flink Hudi commits (#18127)
by Xinli Shang
· 5 weeks ago
22aa1fa
fix: Databricks Spark 3.4 Runtime compatibility for reading Hudi tables (#18292)
by Y Ethan Guo
· 5 weeks ago
8296df0
fix(flink): enable integration test for Hudi Flink Source V2 (#18287)
by Peter Huang
· 5 weeks ago
2c1cb39
feat(vector): add converters from spark to hoodieSchema for vectors (#18190)
by Rahil C
· 5 weeks ago
729b30c
fix: Improve config docs of enabling column stats in metadata table (#18289)
by Y Ethan Guo
· 5 weeks ago
bf4425b
fix: Remove noisy logging when table partition is empty (#18290)
by Y Ethan Guo
· 5 weeks ago
6e0d786
fix(flink): Don't perform table service during mdt initialization if streaming write is enabled (#18283)
by Shuo Cheng
· 5 weeks ago
4499b0b
feat(spark): ZooKeeper node should hold spark app id (for helping debug when lock is held for long time) (#18123)
by Krishen
· 5 weeks ago
9ba2760
feat(table-services): Support clustering file groups with earlier instants times first (#18174)
by Krishen
· 5 weeks ago
31b8706
feat(vector): Add further research for supporting VECTOR type to RFC-99 (#18184)
by Rahil C
· 5 weeks ago
3dcf4c6
feat(client): Add pre-write validator framework (#18239)
by Nada
· 5 weeks ago
3139a19
feat(table-services): Allow users to not parallelize each partition with engine context during clustering planning (#18191)
by Krishen
· 5 weeks ago
Next »