website/docs/maintenance/observability/quickstart.md - fluss - Git at Google

 ---
 sidebar_label: Quickstart Guides
 title: Observability Quickstart Guides
 sidebar_position: 1
 ---

 # Observability Quickstart Guides

 On this page, you can find the following guides to set up an observability stack **based on the instructions in the [Flink quickstart guide](quickstart/flink.md)**:

 - [Observability with Prometheus, Loki and Grafana](#observability-with-prometheus-loki-and-grafana)

 ## Observability with Prometheus, Loki and Grafana

 We provide a minimal quickstart configuration for application observability with Prometheus (metric aggregation system), Loki (log aggregation system) and Grafana (dashboard system).

 The quickstart configuration comes with 2 metric dashboards.

 - `Fluss – overview`: Selected metrics to observe the overall cluster status
 - `Fluss – detail`: Majority of metrics listed in [metrics list](monitor-metrics.md#metrics-list)

 Follow the instructions below to add observability capabilities to your setup.

 1. Download the <a href={ require("../../assets/fluss-quickstart-observability.zip").default } target="_blank">observability quickstart configuration</a> and extract the ZIP archive in your working directory.
 After extracting the archive, the contents of the working directory should be as follows.

 ```
 ├── docker-compose.yml              # docker compose manifest from quickstart guide
 └── fluss-quickstart-observability  # downloaded and extracted ZIP archive
     ├── grafana
     │   ├── grafana.ini
     │   └── provisioning
     │       ├── dashboards
     │       │   ├── default.yml
     │       │   └── fluss
     │       │       └── ...
     │       └── datatsources
     │           └── default.yml
     ├── prometheus
     │   └── prometheus.yml
     └── slf4j
         └── ...
 ```

 2. Next, you need to configure Fluss to expose logs to Loki. We will use [Loki4j](https://loki4j.github.io/loki-logback-appender/) which uses Logback as logging backend.
 The container manifest below configures Fluss to use Logback and Loki4j. Save it to a file named `fluss-slf4j-logback.Dockerfile` in your working directory.

 ```dockerfile
 ARG FLUSS_VERSION

 FROM apache/fluss:$FLUSS_DOCKER_VERSION$

 # remove default logging backend from classpath and add logback to classpath
 RUN rm -rf ${FLUSS_HOME}/lib/log4j-slf4j-impl-*.jar && \
     wget https://repo1.maven.org/maven2/ch/qos/logback/logback-classic/1.2.13/logback-classic-1.2.13.jar -P ${FLUSS_HOME}/lib/ && \
     wget https://repo1.maven.org/maven2/ch/qos/logback/logback-core/1.2.13/logback-core-1.2.13.jar -P ${FLUSS_HOME}/lib/

 # add loki4j logback appender to classpath
 RUN wget https://repo1.maven.org/maven2/com/github/loki4j/loki-logback-appender/1.4.2/loki-logback-appender-1.4.2.jar -P ${FLUSS_HOME}/lib/

 # logback configuration that exposes metrics to loki
 COPY fluss-quickstart-observability/slf4j/logback-loki-console.xml ${FLUSS_HOME}/conf/logback-console.xml
 ```

 :::note
 Detailed configuration instructions for Fluss and Logback can be found [here](logging.md#configuring-logback).
 :::

 3. Additionally, you need to adapt the `docker-compose.yml` and

 - add containers for Prometheus, Loki and Grafana and mount the corresponding configuration directories.
 - build and use the new Fluss image manifest (`fluss-slf4j-logback.Dockerfile`).
 - configure Fluss to expose metrics via Prometheus.
 - add the desired application name that should be used when displaying logs in Grafana as environment variable (`APP_NAME`).
 - configure Flink to expose metrics via Prometheus.

 To do this, you can simply copy the manifest below into your `docker-compose.yml`.

 ```yaml
 services:
   #begin Fluss cluster
   coordinator-server:
     image: fluss-slf4j-logback:$FLUSS_DOCKER_VERSION$
     build:
       args:
         FLUSS_VERSION: $FLUSS_VERSION$
       dockerfile: fluss-slf4j-logback.Dockerfile
     command: coordinatorServer
     depends_on:
       - zookeeper
     environment:
       - |
         FLUSS_PROPERTIES=
         zookeeper.address: zookeeper:2181
         bind.listeners: FLUSS://coordinator-server:9123
         remote.data.dir: /tmp/fluss/remote-data
         datalake.format: paimon
         datalake.paimon.metastore: filesystem
         datalake.paimon.warehouse: /tmp/paimon
         metrics.reporters: prometheus
         metrics.reporter.prometheus.port: 9250
         logback.configurationFile: logback-loki-console.xml
       - APP_NAME=coordinator-server
   tablet-server:
     image: fluss-slf4j-logback:$FLUSS_DOCKER_VERSION$
     build:
       args:
         FLUSS_VERSION: $FLUSS_VERSION$
       dockerfile: fluss-slf4j-logback.Dockerfile
     command: tabletServer
     depends_on:
       - coordinator-server
     environment:
       - |
         FLUSS_PROPERTIES=
         zookeeper.address: zookeeper:2181
         bind.listeners: FLUSS://tablet-server:9123
         data.dir: /tmp/fluss/data
         remote.data.dir: /tmp/fluss/remote-data
         kv.snapshot.interval: 0s
         datalake.format: paimon
         datalake.paimon.metastore: filesystem
         datalake.paimon.warehouse: /tmp/paimon
         metrics.reporters: prometheus
         metrics.reporter.prometheus.port: 9250
         logback.configurationFile: logback-loki-console.xml
       - APP_NAME=tablet-server
   zookeeper:
     restart: always
     image: zookeeper:3.9.2
   #end
   #begin Flink cluster
   jobmanager:
     image: apache/fluss-quickstart-flink:1.20-$FLUSS_DOCKER_VERSION$
     ports:
       - "8083:8081"
     command: jobmanager
     environment:
       - |
         FLINK_PROPERTIES=
         jobmanager.rpc.address: jobmanager
         metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
         metrics.reporter.prom.port: 9250
     volumes:
       - shared-tmpfs:/tmp/paimon
   taskmanager:
     image: apache/fluss-quickstart-flink:1.20-$FLUSS_DOCKER_VERSION$
     depends_on:
       - jobmanager
     command: taskmanager
     environment:
       - |
         FLINK_PROPERTIES=
         jobmanager.rpc.address: jobmanager
         taskmanager.numberOfTaskSlots: 10
         taskmanager.memory.process.size: 2048m
         taskmanager.memory.framework.off-heap.size: 256m
         metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
         metrics.reporter.prom.port: 9250
     volumes:
       - shared-tmpfs:/tmp/paimon
   #end
   #begin observability
   prometheus:
     image: bitnami/prometheus:2.55.1-debian-12-r0
     ports:
       - "9092:9090"
     volumes:
       - ./fluss-quickstart-observability/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
   loki:
     image: grafana/loki:3.3.2
     ports:
       - "3102:3100"
   grafana:
     image:
       grafana/grafana:11.4.0
     ports:
       - "3002:3000"
     depends_on:
       - prometheus
       - loki
     volumes:
       - ./fluss-quickstart-observability/grafana:/etc/grafana:ro
   #end

 volumes:
   shared-tmpfs:
     driver: local
     driver_opts:
       type: "tmpfs"
       device: "tmpfs"
 ```

 Then run

 ```shell
 # note the --build flag!
 docker compose up -d --build
 ```

 to apply the changes.

 :::warning
 This recreates `shared-tmpfs` and all data is lost (created tables, running jobs, etc.)
 :::

 Make sure that the modified and added containers are up and running using

 ```shell
 docker container ls -a
 ```

 4. Now you are all set! You can visit

 - Grafana to view Fluss logs with the [log explorer](http://localhost:3002/a/grafana-lokiexplore-app/) and observe metrics of the Fluss and Flink cluster with the [provided dashboards](http://localhost:3002/dashboards) or
 - the [Prometheus Web UI](http://localhost:9092) to directly query Prometheus with [PromQL](https://prometheus.io/docs/prometheus/2.55/getting_started/).
	---
	sidebar_label: Quickstart Guides
	title: Observability Quickstart Guides
	sidebar_position: 1
	---

	# Observability Quickstart Guides

	On this page, you can find the following guides to set up an observability stack based on the instructions in the [Flink quickstart guide](quickstart/flink.md):

	- [Observability with Prometheus, Loki and Grafana](#observability-with-prometheus-loki-and-grafana)

	## Observability with Prometheus, Loki and Grafana

	We provide a minimal quickstart configuration for application observability with Prometheus (metric aggregation system), Loki (log aggregation system) and Grafana (dashboard system).

	The quickstart configuration comes with 2 metric dashboards.

	- `Fluss – overview`: Selected metrics to observe the overall cluster status
	- `Fluss – detail`: Majority of metrics listed in [metrics list](monitor-metrics.md#metrics-list)

	Follow the instructions below to add observability capabilities to your setup.

	1. Download the <a href={ require("../../assets/fluss-quickstart-observability.zip").default } target="_blank">observability quickstart configuration</a> and extract the ZIP archive in your working directory.
	After extracting the archive, the contents of the working directory should be as follows.

	```
	├── docker-compose.yml # docker compose manifest from quickstart guide
	└── fluss-quickstart-observability # downloaded and extracted ZIP archive
	├── grafana
	│ ├── grafana.ini
	│ └── provisioning
	│ ├── dashboards
	│ │ ├── default.yml
	│ │ └── fluss
	│ │ └── ...
	│ └── datatsources
	│ └── default.yml
	├── prometheus
	│ └── prometheus.yml
	└── slf4j
	└── ...
	```

	2. Next, you need to configure Fluss to expose logs to Loki. We will use [Loki4j](https://loki4j.github.io/loki-logback-appender/) which uses Logback as logging backend.
	The container manifest below configures Fluss to use Logback and Loki4j. Save it to a file named `fluss-slf4j-logback.Dockerfile` in your working directory.

	```dockerfile
	ARG FLUSS_VERSION

	FROM apache/fluss:$FLUSS_DOCKER_VERSION$

	# remove default logging backend from classpath and add logback to classpath
	RUN rm -rf ${FLUSS_HOME}/lib/log4j-slf4j-impl-*.jar && \
	wget https://repo1.maven.org/maven2/ch/qos/logback/logback-classic/1.2.13/logback-classic-1.2.13.jar -P ${FLUSS_HOME}/lib/ && \
	wget https://repo1.maven.org/maven2/ch/qos/logback/logback-core/1.2.13/logback-core-1.2.13.jar -P ${FLUSS_HOME}/lib/

	# add loki4j logback appender to classpath
	RUN wget https://repo1.maven.org/maven2/com/github/loki4j/loki-logback-appender/1.4.2/loki-logback-appender-1.4.2.jar -P ${FLUSS_HOME}/lib/

	# logback configuration that exposes metrics to loki
	COPY fluss-quickstart-observability/slf4j/logback-loki-console.xml ${FLUSS_HOME}/conf/logback-console.xml
	```

	:::note
	Detailed configuration instructions for Fluss and Logback can be found [here](logging.md#configuring-logback).
	:::

	3. Additionally, you need to adapt the `docker-compose.yml` and

	- add containers for Prometheus, Loki and Grafana and mount the corresponding configuration directories.
	- build and use the new Fluss image manifest (`fluss-slf4j-logback.Dockerfile`).
	- configure Fluss to expose metrics via Prometheus.
	- add the desired application name that should be used when displaying logs in Grafana as environment variable (`APP_NAME`).
	- configure Flink to expose metrics via Prometheus.

	To do this, you can simply copy the manifest below into your `docker-compose.yml`.

	```yaml
	services:
	#begin Fluss cluster
	coordinator-server:
	image: fluss-slf4j-logback:$FLUSS_DOCKER_VERSION$
	build:
	args:
	FLUSS_VERSION: $FLUSS_VERSION$
	dockerfile: fluss-slf4j-logback.Dockerfile
	command: coordinatorServer
	depends_on:
	- zookeeper
	environment:
	- \|
	FLUSS_PROPERTIES=
	zookeeper.address: zookeeper:2181
	bind.listeners: FLUSS://coordinator-server:9123
	remote.data.dir: /tmp/fluss/remote-data
	datalake.format: paimon
	datalake.paimon.metastore: filesystem
	datalake.paimon.warehouse: /tmp/paimon
	metrics.reporters: prometheus
	metrics.reporter.prometheus.port: 9250
	logback.configurationFile: logback-loki-console.xml
	- APP_NAME=coordinator-server
	tablet-server:
	image: fluss-slf4j-logback:$FLUSS_DOCKER_VERSION$
	build:
	args:
	FLUSS_VERSION: $FLUSS_VERSION$
	dockerfile: fluss-slf4j-logback.Dockerfile
	command: tabletServer
	depends_on:
	- coordinator-server
	environment:
	- \|
	FLUSS_PROPERTIES=
	zookeeper.address: zookeeper:2181
	bind.listeners: FLUSS://tablet-server:9123
	data.dir: /tmp/fluss/data
	remote.data.dir: /tmp/fluss/remote-data
	kv.snapshot.interval: 0s
	datalake.format: paimon
	datalake.paimon.metastore: filesystem
	datalake.paimon.warehouse: /tmp/paimon
	metrics.reporters: prometheus
	metrics.reporter.prometheus.port: 9250
	logback.configurationFile: logback-loki-console.xml
	- APP_NAME=tablet-server
	zookeeper:
	restart: always
	image: zookeeper:3.9.2
	#end
	#begin Flink cluster
	jobmanager:
	image: apache/fluss-quickstart-flink:1.20-$FLUSS_DOCKER_VERSION$
	ports:
	- "8083:8081"
	command: jobmanager
	environment:
	- \|
	FLINK_PROPERTIES=
	jobmanager.rpc.address: jobmanager
	metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
	metrics.reporter.prom.port: 9250
	volumes:
	- shared-tmpfs:/tmp/paimon
	taskmanager:
	image: apache/fluss-quickstart-flink:1.20-$FLUSS_DOCKER_VERSION$
	depends_on:
	- jobmanager
	command: taskmanager
	environment:
	- \|
	FLINK_PROPERTIES=
	jobmanager.rpc.address: jobmanager
	taskmanager.numberOfTaskSlots: 10
	taskmanager.memory.process.size: 2048m
	taskmanager.memory.framework.off-heap.size: 256m
	metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
	metrics.reporter.prom.port: 9250
	volumes:
	- shared-tmpfs:/tmp/paimon
	#end
	#begin observability
	prometheus:
	image: bitnami/prometheus:2.55.1-debian-12-r0
	ports:
	- "9092:9090"
	volumes:
	- ./fluss-quickstart-observability/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
	loki:
	image: grafana/loki:3.3.2
	ports:
	- "3102:3100"
	grafana:
	image:
	grafana/grafana:11.4.0
	ports:
	- "3002:3000"
	depends_on:
	- prometheus
	- loki
	volumes:
	- ./fluss-quickstart-observability/grafana:/etc/grafana:ro
	#end

	volumes:
	shared-tmpfs:
	driver: local
	driver_opts:
	type: "tmpfs"
	device: "tmpfs"
	```

	Then run

	```shell
	# note the --build flag!
	docker compose up -d --build
	```

	to apply the changes.

	:::warning
	This recreates `shared-tmpfs` and all data is lost (created tables, running jobs, etc.)
	:::

	Make sure that the modified and added containers are up and running using

	```shell
	docker container ls -a
	```

	4. Now you are all set! You can visit

	- Grafana to view Fluss logs with the [log explorer](http://localhost:3002/a/grafana-lokiexplore-app/) and observe metrics of the Fluss and Flink cluster with the [provided dashboards](http://localhost:3002/dashboards) or
	- the [Prometheus Web UI](http://localhost:9092) to directly query Prometheus with [PromQL](https://prometheus.io/docs/prometheus/2.55/getting_started/).