This guide goes through the process of setting up Lakekeeper, which can be used as the REST Catalog service for Texera's workflow result storage.
For more information of why using RESTCatalog, see Issue #4126.
On macOS / Linux, run
brew install lakekeeper
Verify the installation by running:
lakekeeper --version
Alternatively, you can download a pre-built binary from the https://github.com/lakekeeper/lakekeeper/releases and place it on your $PATH.
Create a database using the SQL script in Texera's repository:
psql -f sql/texera_lakekeeper.sql
Edit the User Configuration section at the top of bin/bootstrap-lakekeeper.sh.
First, set the PostgreSQL connection URLs used by Lakekeeper:
-LAKEKEEPER__PG_DATABASE_URL_READ="" -LAKEKEEPER__PG_DATABASE_URL_WRITE="" +LAKEKEEPER__PG_DATABASE_URL_READ="postgres://<user>:<urlencoded_password>@<host>:5432/texera_lakekeeper" +LAKEKEEPER__PG_DATABASE_URL_WRITE="postgres://<user>:<urlencoded_password>@<host>:5432/texera_lakekeeper"
If you have customized storage-related values in common/config/src/main/resources/storage.conf (for example, the bucket name, S3 endpoint, or MinIO credentials), check the below environment variables in the script and modify their values accordingly:
# Storage settings — must stay in sync with storage.conf # if needed, update the default values after `:-` to match storage.conf STORAGE_ICEBERG_CATALOG_REST_URI="${STORAGE_ICEBERG_CATALOG_REST_URI:-http://localhost:8181/catalog}" STORAGE_ICEBERG_CATALOG_REST_WAREHOUSE_NAME="${STORAGE_ICEBERG_CATALOG_REST_WAREHOUSE_NAME:-texera}" STORAGE_ICEBERG_CATALOG_REST_REGION="${STORAGE_ICEBERG_CATALOG_REST_REGION:-us-west-2}" STORAGE_ICEBERG_CATALOG_REST_S3_BUCKET="${STORAGE_ICEBERG_CATALOG_REST_S3_BUCKET:-texera-iceberg}" STORAGE_S3_ENDPOINT="${STORAGE_S3_ENDPOINT:-http://localhost:9000}" STORAGE_S3_AUTH_USERNAME="${STORAGE_S3_AUTH_USERNAME:-texera_minio}" STORAGE_S3_AUTH_PASSWORD="${STORAGE_S3_AUTH_PASSWORD:-password}"
Run the following script in Texera repo:
bash bin/bootstrap-lakekeeper.sh
The script will:
Check that Lakekeeper is healthy by running:
curl http://localhost:8181/health
You should see a JSON response with "health":"ok".
Verify that the warehouse has been created by running:
curl http://localhost:8181/management/v1/warehouse
You should see a warehouse in the response.
To make Texera actually use the Lakekeeper REST catalog you just set up, edit common/config/src/main/resources/storage.conf:
storage { iceberg { catalog { - type = postgres + type = rest ... } } }
Lakekeeper is now your service of managing Iceberg RESTCatalog. Texera workflows that produce Iceberg results will write to the S3 bucket via the Iceberg RESTCatalog.