blob: 8cee873c89ba9ff4e6b8f97a25212b9c9371f71c [file] [log] [blame] [view]
---
sidebar_label: Quickstart Guides
title: Observability Quickstart Guides
sidebar_position: 1
---
# Observability Quickstart Guides
On this page, you can find the following guides to set up an observability stack **based on the instructions in the [Flink quickstart guide](quickstart/flink.md)**:
- [Observability with Prometheus, Loki and Grafana](#observability-with-prometheus-loki-and-grafana)
## Observability with Prometheus, Loki and Grafana
We provide a minimal quickstart configuration for application observability with Prometheus (metric aggregation system), Loki (log aggregation system) and Grafana (dashboard system).
The quickstart configuration comes with 2 metric dashboards.
- `Fluss – overview`: Selected metrics to observe the overall cluster status
- `Fluss – detail`: Majority of metrics listed in [metrics list](monitor-metrics.md#metrics-list)
Follow the instructions below to add observability capabilities to your setup.
1. Download the <a href={ require("../../assets/fluss-quickstart-observability.zip").default } target="_blank">observability quickstart configuration</a> and extract the ZIP archive in your working directory.
After extracting the archive, the contents of the working directory should be as follows.
```
├── docker-compose.yml # docker compose manifest from quickstart guide
└── fluss-quickstart-observability # downloaded and extracted ZIP archive
├── grafana
│ ├── grafana.ini
│ └── provisioning
│ ├── dashboards
│ │ ├── default.yml
│ │ └── fluss
│ │ └── ...
│ └── datatsources
│ └── default.yml
├── prometheus
│ └── prometheus.yml
└── slf4j
└── ...
```
2. Next, you need to configure Fluss to expose logs to Loki. We will use [Loki4j](https://loki4j.github.io/loki-logback-appender/) which uses Logback as logging backend.
The container manifest below configures Fluss to use Logback and Loki4j. Save it to a file named `fluss-slf4j-logback.Dockerfile` in your working directory.
```dockerfile
ARG FLUSS_VERSION
FROM apache/fluss:$FLUSS_DOCKER_VERSION$
# remove default logging backend from classpath and add logback to classpath
RUN rm -rf ${FLUSS_HOME}/lib/log4j-slf4j-impl-*.jar && \
wget https://repo1.maven.org/maven2/ch/qos/logback/logback-classic/1.2.13/logback-classic-1.2.13.jar -P ${FLUSS_HOME}/lib/ && \
wget https://repo1.maven.org/maven2/ch/qos/logback/logback-core/1.2.13/logback-core-1.2.13.jar -P ${FLUSS_HOME}/lib/
# add loki4j logback appender to classpath
RUN wget https://repo1.maven.org/maven2/com/github/loki4j/loki-logback-appender/1.4.2/loki-logback-appender-1.4.2.jar -P ${FLUSS_HOME}/lib/
# logback configuration that exposes metrics to loki
COPY fluss-quickstart-observability/slf4j/logback-loki-console.xml ${FLUSS_HOME}/conf/logback-console.xml
```
:::note
Detailed configuration instructions for Fluss and Logback can be found [here](logging.md#configuring-logback).
:::
3. Additionally, you need to adapt the `docker-compose.yml` and
- add containers for Prometheus, Loki and Grafana and mount the corresponding configuration directories.
- build and use the new Fluss image manifest (`fluss-slf4j-logback.Dockerfile`).
- configure Fluss to expose metrics via Prometheus.
- add the desired application name that should be used when displaying logs in Grafana as environment variable (`APP_NAME`).
- configure Flink to expose metrics via Prometheus.
To do this, you can simply copy the manifest below into your `docker-compose.yml`.
```yaml
services:
#begin Fluss cluster
coordinator-server:
image: fluss-slf4j-logback:$FLUSS_DOCKER_VERSION$
build:
args:
FLUSS_VERSION: $FLUSS_VERSION$
dockerfile: fluss-slf4j-logback.Dockerfile
command: coordinatorServer
depends_on:
- zookeeper
environment:
- |
FLUSS_PROPERTIES=
zookeeper.address: zookeeper:2181
bind.listeners: FLUSS://coordinator-server:9123
remote.data.dir: /tmp/fluss/remote-data
datalake.format: paimon
datalake.paimon.metastore: filesystem
datalake.paimon.warehouse: /tmp/paimon
metrics.reporters: prometheus
metrics.reporter.prometheus.port: 9250
logback.configurationFile: logback-loki-console.xml
- APP_NAME=coordinator-server
tablet-server:
image: fluss-slf4j-logback:$FLUSS_DOCKER_VERSION$
build:
args:
FLUSS_VERSION: $FLUSS_VERSION$
dockerfile: fluss-slf4j-logback.Dockerfile
command: tabletServer
depends_on:
- coordinator-server
environment:
- |
FLUSS_PROPERTIES=
zookeeper.address: zookeeper:2181
bind.listeners: FLUSS://tablet-server:9123
data.dir: /tmp/fluss/data
remote.data.dir: /tmp/fluss/remote-data
kv.snapshot.interval: 0s
datalake.format: paimon
datalake.paimon.metastore: filesystem
datalake.paimon.warehouse: /tmp/paimon
metrics.reporters: prometheus
metrics.reporter.prometheus.port: 9250
logback.configurationFile: logback-loki-console.xml
- APP_NAME=tablet-server
zookeeper:
restart: always
image: zookeeper:3.9.2
#end
#begin Flink cluster
jobmanager:
image: apache/fluss-quickstart-flink:1.20-$FLUSS_DOCKER_VERSION$
ports:
- "8083:8081"
command: jobmanager
environment:
- |
FLINK_PROPERTIES=
jobmanager.rpc.address: jobmanager
metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
metrics.reporter.prom.port: 9250
volumes:
- shared-tmpfs:/tmp/paimon
taskmanager:
image: apache/fluss-quickstart-flink:1.20-$FLUSS_DOCKER_VERSION$
depends_on:
- jobmanager
command: taskmanager
environment:
- |
FLINK_PROPERTIES=
jobmanager.rpc.address: jobmanager
taskmanager.numberOfTaskSlots: 10
taskmanager.memory.process.size: 2048m
taskmanager.memory.framework.off-heap.size: 256m
metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
metrics.reporter.prom.port: 9250
volumes:
- shared-tmpfs:/tmp/paimon
#end
#begin observability
prometheus:
image: bitnami/prometheus:2.55.1-debian-12-r0
ports:
- "9092:9090"
volumes:
- ./fluss-quickstart-observability/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
loki:
image: grafana/loki:3.3.2
ports:
- "3102:3100"
grafana:
image:
grafana/grafana:11.4.0
ports:
- "3002:3000"
depends_on:
- prometheus
- loki
volumes:
- ./fluss-quickstart-observability/grafana:/etc/grafana:ro
#end
volumes:
shared-tmpfs:
driver: local
driver_opts:
type: "tmpfs"
device: "tmpfs"
```
Then run
```shell
# note the --build flag!
docker compose up -d --build
```
to apply the changes.
:::warning
This recreates `shared-tmpfs` and all data is lost (created tables, running jobs, etc.)
:::
Make sure that the modified and added containers are up and running using
```shell
docker container ls -a
```
4. Now you are all set! You can visit
- Grafana to view Fluss logs with the [log explorer](http://localhost:3002/a/grafana-lokiexplore-app/) and observe metrics of the Fluss and Flink cluster with the [provided dashboards](http://localhost:3002/dashboards) or
- the [Prometheus Web UI](http://localhost:9092) to directly query Prometheus with [PromQL](https://prometheus.io/docs/prometheus/2.55/getting_started/).