refactor(amber): stop hardcoding S3 in REST catalog init (#4988) ### What changes were proposed in this PR? Stop hardcoding `s3.endpoint`, `s3.region`, `s3.path-style-access`, `s3.access-key-id` and `s3.secret-access-key` at REST-catalog init in both `IcebergUtil.createRestCatalog` (Scala) and `iceberg_utils.create_rest_catalog` (Python). Both helpers now pass only `warehouse` + catalog `uri` (and on the Scala side the `FileIO` impl hint). **Why:** When a Lakekeeper warehouse is created, its S3 settings (endpoint, region, credentials, path-style) are registered against that warehouse on the server. At catalog init the client only needs `warehouse` + `uri` — Lakekeeper resolves the S3 config from the warehouse record and serves it back. The hardcoded `StorageConfig.s3*` values on the client were redundant, and forcing them everywhere also pinned every warehouse to the single system bucket. Removing them lets each warehouse own its own storage settings. `StorageConfig.s3*` itself is kept — `pytexera/storage/large_binary_manager.py` still uses it for the non-Iceberg `texera-large-binaries` bucket (R UDF large-binary support), which is out of scope. ### Any related issues, documentation, discussions? Closes #4987 ### How was this PR tested? - `sbt "WorkflowCore/compile"` — passes; verifies no other Scala caller depends on the removed properties. - Python edits parse cleanly via `ast.parse`; the only caller (`iceberg_catalog_instance.py`) is updated to match the new `create_rest_catalog` signature. End-to-end verification (warehouse with its own S3 settings → REST catalog opened with only `warehouse` + `uri` → table round-trip) requires a running Lakekeeper, which CI doesn't have today. #4276 (draft) wires Lakekeeper into CI; once that lands I'll add the integration test on top of it. ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.7) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

tree: 73522366aac0199784fca6556c2d59b5a8e39233

README.md

Apache Texera (Incubating) is an open-source platform for human-AI collaborative data science using visual workflows. It enables human analysts to construct, execute, and refine data analysis tasks through an intuitive GUI, assisted by AI agents that understand natural-language instructions. Texera is well suited for a wide range of applications, including “AI for Science,” by making advanced AI and data science capabilities accessible to a broader community. It can run on a laptop for local use or be deployed in the cloud to support scalable processing of large datasets.

The platform has the following key features:

Natural-language data science through AI agents
Intuitive GUI-based workflows for data science
Real-time collaboration for workflow editing and execution
Runtime debugging and interactive workflow execution
Language-agnostic workflow runtime, native support for Python and Java
Parallel backend engine for scalable big-data processing
Separation of compute and storage for flexible cloud deployment

texera-screenshot

Citation

Please cite Texera as


@article{DBLP:journals/pvldb/WangHNKALLDL24,
  author       = {Zuozhi Wang and
                  Yicong Huang and
                  Shengquan Ni and
                  Avinash Kumar and
                  Sadeem Alsudais and
                  Xiaozhen Liu and
                  Xinyuan Lin and
                  Yunyan Ding and
                  Chen Li},
  title        = {Texera: {A} System for Collaborative and Interactive Data Analytics
                  Using Workflows},
  journal      = {Proc. {VLDB} Endow.},
  volume       = {17},
  number       = {11},
  pages        = {3580--3588},
  year         = {2024},
  url          = {https://www.vldb.org/pvldb/vol17/p3580-wang.pdf},
  timestamp    = {Thu, 19 Sep 2024 13:09:37 +0200},
  biburl       = {https://dblp.org/rec/journals/pvldb/WangHNKALLDL24.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}