refactor(amber): stop hardcoding S3 in REST catalog init (#4988) ### What changes were proposed in this PR? Stop hardcoding `s3.endpoint`, `s3.region`, `s3.path-style-access`, `s3.access-key-id` and `s3.secret-access-key` at REST-catalog init in both `IcebergUtil.createRestCatalog` (Scala) and `iceberg_utils.create_rest_catalog` (Python). Both helpers now pass only `warehouse` + catalog `uri` (and on the Scala side the `FileIO` impl hint). **Why:** When a Lakekeeper warehouse is created, its S3 settings (endpoint, region, credentials, path-style) are registered against that warehouse on the server. At catalog init the client only needs `warehouse` + `uri` — Lakekeeper resolves the S3 config from the warehouse record and serves it back. The hardcoded `StorageConfig.s3*` values on the client were redundant, and forcing them everywhere also pinned every warehouse to the single system bucket. Removing them lets each warehouse own its own storage settings. `StorageConfig.s3*` itself is kept — `pytexera/storage/large_binary_manager.py` still uses it for the non-Iceberg `texera-large-binaries` bucket (R UDF large-binary support), which is out of scope. ### Any related issues, documentation, discussions? Closes #4987 ### How was this PR tested? - `sbt "WorkflowCore/compile"` — passes; verifies no other Scala caller depends on the removed properties. - Python edits parse cleanly via `ast.parse`; the only caller (`iceberg_catalog_instance.py`) is updated to match the new `create_rest_catalog` signature. End-to-end verification (warehouse with its own S3 settings → REST catalog opened with only `warehouse` + `uri` → table round-trip) requires a running Lakekeeper, which CI doesn't have today. #4276 (draft) wires Lakekeeper into CI; once that lands I'll add the integration test on top of it. ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.7) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apache Texera (Incubating) is an open-source platform for human-AI collaborative data science using visual workflows. It enables human analysts to construct, execute, and refine data analysis tasks through an intuitive GUI, assisted by AI agents that understand natural-language instructions. Texera is well suited for a wide range of applications, including “AI for Science,” by making advanced AI and data science capabilities accessible to a broader community. It can run on a laptop for local use or be deployed in the cloud to support scalable processing of large datasets.
The platform has the following key features:
Please cite Texera as
@article{DBLP:journals/pvldb/WangHNKALLDL24, author = {Zuozhi Wang and Yicong Huang and Shengquan Ni and Avinash Kumar and Sadeem Alsudais and Xiaozhen Liu and Xinyuan Lin and Yunyan Ding and Chen Li}, title = {Texera: {A} System for Collaborative and Interactive Data Analytics Using Workflows}, journal = {Proc. {VLDB} Endow.}, volume = {17}, number = {11}, pages = {3580--3588}, year = {2024}, url = {https://www.vldb.org/pvldb/vol17/p3580-wang.pdf}, timestamp = {Thu, 19 Sep 2024 13:09:37 +0200}, biburl = {https://dblp.org/rec/journals/pvldb/WangHNKALLDL24.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }