blob: 740901da7051db27727e1bd41768038399353c99 [file] [log] [blame]
---
sidebar_position: 4
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Set Up with Kubernetes
This section provides a quick guide to using SeaTunnel with Kubernetes.
## Prerequisites
We assume that you have a local installations of the following:
- [docker](https://docs.docker.com/)
- [kubernetes](https://kubernetes.io/)
- [helm](https://helm.sh/docs/intro/quickstart/)
So that the `kubectl` and `helm` commands are available on your local system.
For kubernetes [minikube](https://minikube.sigs.k8s.io/docs/start/) is our choice, at the time of writing this we are using version v1.23.3. You can start a cluster with the following command:
```bash
minikube start --kubernetes-version=v1.23.3
```
## Installation
### SeaTunnel docker image
To run the image with SeaTunnel, first create a `Dockerfile`:
<Tabs
groupId="engine-type"
defaultValue="flink"
values={[
{label: 'Flink', value: 'flink'},
]}>
<TabItem value="flink">
```Dockerfile
FROM flink:1.13
ENV SEATUNNEL_VERSION="2.1.2"
ENV SEATUNNEL_HOME = "/opt/seatunnel"
RUN mkdir -p $SEATUNNEL_HOME
RUN wget https://archive.apache.org/dist/incubator/seatunnel/${SEATUNNEL_VERSION}/apache-seatunnel-incubating-${SEATUNNEL_VERSION}-bin.tar.gz
RUN tar -xzvf apache-seatunnel-incubating-${SEATUNNEL_VERSION}-bin.tar.gz
RUN cp -r apache-seatunnel-incubating-${SEATUNNEL_VERSION}/* $SEATUNNEL_HOME/
RUN rm -rf apache-seatunnel-incubating-${SEATUNNEL_VERSION}*
RUN rm -rf $SEATUNNEL_HOME/connectors/spark
```
Then run the following commands to build the image:
```bash
docker build -t seatunnel:2.1.2-flink-1.13 -f Dockerfile .
```
Image `seatunnel:2.1.2-flink-1.13` need to be present in the host (minikube) so that the deployment can take place.
Load image to minikube via:
```bash
minikube image load seatunnel:2.1.2-flink-1.13
```
</TabItem>
</Tabs>
### Deploying the operator
<Tabs
groupId="engine-type"
defaultValue="flink"
values={[
{label: 'Flink', value: 'flink'},
]}>
<TabItem value="flink">
The steps below provide a quick walk-through on setting up the Flink Kubernetes Operator.
Install the certificate manager on your Kubernetes cluster to enable adding the webhook component (only needed once per Kubernetes cluster):
```bash
kubectl create -f https://github.com/jetstack/cert-manager/releases/download/v1.7.1/cert-manager.yaml
```
Now you can deploy the latest stable Flink Kubernetes Operator version using the included Helm chart:
```bash
helm repo add flink-operator-repo https://downloads.apache.org/flink/flink-kubernetes-operator-0.1.0/
helm install flink-kubernetes-operator flink-operator-repo/flink-kubernetes-operator
```
You may verify your installation via `kubectl`:
```bash
kubectl get pods
NAME READY STATUS RESTARTS AGE
flink-kubernetes-operator-5f466b8549-mgchb 1/1 Running 3 (23h ago) 16d
```
</TabItem>
</Tabs>
## Run SeaTunnel Application
**Run Application:**: SeaTunnel already providers out-of-the-box [configurations](https://github.com/apache/incubator-seatunnel/tree/dev/config).
<Tabs
groupId="engine-type"
defaultValue="flink"
values={[
{label: 'Flink', value: 'flink'},
]}>
<TabItem value="flink">
In this guide we are going to use [flink.streaming.conf](https://github.com/apache/incubator-seatunnel/blob/dev/config/flink.streaming.conf.template):
```conf
env {
execution.parallelism = 1
}
source {
FakeSourceStream {
result_table_name = "fake"
field_name = "name,age"
}
}
transform {
sql {
sql = "select name,age from fake"
}
}
sink {
ConsoleSink {}
}
```
This configuration need to be present when we are going to deploy the application (SeaTunnel) to Flink cluster (on Kubernetes), we also need to configure a Pod to Use a PersistentVolume for Storage.
- Create `/mnt/data` on your Node. Open a shell to the single Node in your cluster. How you open a shell depends on how you set up your cluster. For example, in our case weare using Minikube, you can open a shell to your Node by entering `minikube ssh`.
In your shell on that Node, create a /mnt/data directory:
```bash
minikube ssh
# This assumes that your Node uses "sudo" to run commands
# as the superuser
sudo mkdir /mnt/data
```
- Copy application (SeaTunnel) configuration files to your Node.
```bash
minikube cp flink.streaming.conf /mnt/data/flink.streaming.conf
```
Once the Flink Kubernetes Operator is running as seen in the previous steps you are ready to submit a Flink (SeaTunnel) job:
- Create `seatunnel-flink.yaml` FlinkDeployment manifest:
```yaml
apiVersion: flink.apache.org/v1alpha1
kind: FlinkDeployment
metadata:
namespace: default
name: seatunnel-flink-streaming-example
spec:
image: seatunnel:2.1.2-flink-1.13
flinkVersion: v1_14
flinkConfiguration:
taskmanager.numberOfTaskSlots: "2"
serviceAccount: flink
jobManager:
replicas: 1
resource:
memory: "2048m"
cpu: 1
taskManager:
resource:
memory: "2048m"
cpu: 2
podTemplate:
spec:
containers:
- name: flink-main-container
volumeMounts:
- mountPath: /data
name: config-volume
volumes:
- name: config-volume
hostPath:
path: "/mnt/data"
type: Directory
job:
jarURI: local:///opt/seatunnel/lib/seatunnel-core-flink.jar
entryClass: org.apache.seatunnel.core.flink.SeatunnelFlink
args: ["--config", "/data/flink.streaming.conf"]
parallelism: 2
upgradeMode: stateless
```
- Run the example application:
```bash
kubectl apply -f seatunnel-flink.yaml
```
</TabItem>
</Tabs>
**See The Output**
<Tabs
groupId="engine-type"
defaultValue="flink"
values={[
{label: 'Flink', value: 'flink'},
]}>
<TabItem value="flink">
You may follow the logs of your job, after a successful startup (which can take on the order of a minute in a fresh environment, seconds afterwards) you can:
```bash
kubectl logs -f deploy/seatunnel-flink-streaming-example
```
To expose the Flink Dashboard you may add a port-forward rule:
```bash
kubectl port-forward svc/seatunnel-flink-streaming-example-rest 8081
```
Now the Flink Dashboard is accessible at [localhost:8081](http://localhost:8081).
Or launch `minikube dashboard` for a web-based Kubernetes user interface.
The content printed in the TaskManager Stdout log:
```bash
kubectl logs \
-l 'app in (seatunnel-flink-streaming-example), component in (taskmanager)' \
--tail=-1 \
-f
```
looks like the below (your content may be different since we use `FakeSourceStream` to automatically generate random stream data):
```shell
+I[Kid Xiong, 1650316786086]
+I[Ricky Huo, 1650316787089]
+I[Ricky Huo, 1650316788089]
+I[Ricky Huo, 1650316789090]
+I[Kid Xiong, 1650316790090]
+I[Kid Xiong, 1650316791091]
+I[Kid Xiong, 1650316792092]
```
To stop your job and delete your FlinkDeployment you can simply:
```bash
kubectl delete -f seatunnel-flink.yaml
```
</TabItem>
</Tabs>
Happy SeaTunneling!
## What's More
For now, you are already taking a quick look at SeaTunnel, you could see [connector](/category/connector) to find all source and sink SeaTunnel supported.
Or see [deployment](../deployment.mdx) if you want to submit your application in another kind of your engine cluster.