refactor(amber): stop hardcoding S3 in REST catalog init (#4988)

### What changes were proposed in this PR?

Stop hardcoding `s3.endpoint`, `s3.region`, `s3.path-style-access`,
`s3.access-key-id` and `s3.secret-access-key` at REST-catalog init in
both `IcebergUtil.createRestCatalog` (Scala) and
`iceberg_utils.create_rest_catalog` (Python). Both helpers now pass only
`warehouse` + catalog `uri` (and on the Scala side the `FileIO` impl
hint).

**Why:** When a Lakekeeper warehouse is created, its S3 settings
(endpoint, region, credentials, path-style) are registered against that
warehouse on the server. At catalog init the client only needs
`warehouse` + `uri` — Lakekeeper resolves the S3 config from the
warehouse record and serves it back. The hardcoded `StorageConfig.s3*`
values on the client were redundant, and forcing them everywhere also
pinned every warehouse to the single system bucket. Removing them lets
each warehouse own its own storage settings.

`StorageConfig.s3*` itself is kept —
`pytexera/storage/large_binary_manager.py` still uses it for the
non-Iceberg `texera-large-binaries` bucket (R UDF large-binary support),
which is out of scope.

### Any related issues, documentation, discussions?

Closes #4987

### How was this PR tested?

- `sbt "WorkflowCore/compile"` — passes; verifies no other Scala caller
depends on the removed properties.
- Python edits parse cleanly via `ast.parse`; the only caller
(`iceberg_catalog_instance.py`) is updated to match the new
`create_rest_catalog` signature.

End-to-end verification (warehouse with its own S3 settings → REST
catalog opened with only `warehouse` + `uri` → table round-trip)
requires a running Lakekeeper, which CI doesn't have today. #4276
(draft) wires Lakekeeper into CI; once that lands I'll add the
integration test on top of it.

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 files changed
tree: 73522366aac0199784fca6556c2d59b5a8e39233
  1. .github/
  2. .run/
  3. access-control-service/
  4. agent-service/
  5. amber/
  6. bin/
  7. common/
  8. computing-unit-managing-service/
  9. config-service/
  10. docs/
  11. file-service/
  12. frontend/
  13. licenses/
  14. licenses-3rd-party-code/
  15. project/
  16. pyright-language-service/
  17. sql/
  18. workflow-compiling-service/
  19. .asf.yaml
  20. .dockerignore
  21. .gitattributes
  22. .gitignore
  23. .jvmopts
  24. .licenserc.yaml
  25. .scalafix.conf
  26. .scalafmt.conf
  27. AGENTS.md
  28. build.sbt
  29. CLAUDE.md
  30. codecov.yml
  31. CONTRIBUTING.md
  32. DISCLAIMER
  33. LICENSE
  34. NOTICE
  35. README.md
  36. SECURITY.md
README.md

Apache Texera (Incubating) is an open-source platform for human-AI collaborative data science using visual workflows. It enables human analysts to construct, execute, and refine data analysis tasks through an intuitive GUI, assisted by AI agents that understand natural-language instructions. Texera is well suited for a wide range of applications, including “AI for Science,” by making advanced AI and data science capabilities accessible to a broader community. It can run on a laptop for local use or be deployed in the cloud to support scalable processing of large datasets.

The platform has the following key features:

  • Natural-language data science through AI agents
  • Intuitive GUI-based workflows for data science
  • Real-time collaboration for workflow editing and execution
  • Runtime debugging and interactive workflow execution
  • Language-agnostic workflow runtime, native support for Python and Java
  • Parallel backend engine for scalable big-data processing
  • Separation of compute and storage for flexible cloud deployment

texera-screenshot

Citation

Please cite Texera as


@article{DBLP:journals/pvldb/WangHNKALLDL24, author = {Zuozhi Wang and Yicong Huang and Shengquan Ni and Avinash Kumar and Sadeem Alsudais and Xiaozhen Liu and Xinyuan Lin and Yunyan Ding and Chen Li}, title = {Texera: {A} System for Collaborative and Interactive Data Analytics Using Workflows}, journal = {Proc. {VLDB} Endow.}, volume = {17}, number = {11}, pages = {3580--3588}, year = {2024}, url = {https://www.vldb.org/pvldb/vol17/p3580-wang.pdf}, timestamp = {Thu, 19 Sep 2024 13:09:37 +0200}, biburl = {https://dblp.org/rec/journals/pvldb/WangHNKALLDL24.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }