blob: 8e218c333d474b41a480a90cde5a05b88e5916c0 [file] [log] [blame] [view]
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Doris compose
Use doris compose to create doris docker compose clusters.
## Requirements
### 1. Make sure you have docker permissions
run:
```shell
docker run hello-world
```
if have problem with permission denied, then [add-docker-permission](https://docs.docker.com/engine/install/linux-postinstall/).
Make sure BuildKit configured in the machine. if not follow [docker-with-BuildKit](https://docs.docker.com/build/buildkit/).
### 2. The doris image should contains
```shell
/opt/apache-doris/{fe, be, ms}
```
If don't create cloud cluster, the image no need to contains the ms pkg.
If build doris use `sh build.sh --fe --be --cloud` **without do any change on their origin conf or shells**, then its `output/` satisfy with all above, then run command in doris root directory will generate such a image. If you want to pack a product that is not the `output/` directory, you can modify `Dockerfile` by yourself.
```shell
docker build -f docker/runtime/doris-compose/Dockerfile -t <image> .
```
The Dockerfile default use JDK 17, for doris 2.1, 3.0, master, they all default use JDK 17.
But doris 2.0 still use JDK 8, for build 2.0 image, user need specific use JDK 8 with arg `JDK_IMAGE=openjdk:8u342-jdk`. Here is build 2.0 image command:
```shell
docker build -f docker/runtime/doris-compose/Dockerfile \
--build-arg JDK_IMAGE=openjdk:8u342-jdk \
-t <image> .
```
The `<image>` is the name you want the docker image to have.
User can also download a doris release package from [Doris Home](https://doris.apache.org/docs/releasenotes/all-release) or [Doris Github](https://github.com/apache/doris/releases), extract it, then build its image with arg `OUTPUT_PATH`
for example:
```shell
cd ~/tmp
wget https://apache-doris-releases.oss-accelerate.aliyuncs.com/apache-doris-3.0.5-bin-x64.tar.gz
tar xvf apache-doris-3.0.5-bin-x64.tar.gz # after extract, there will be a directory ./apache-doris-3.0.5-bin-x64/{fe, be, ms}
# -f: the Dockerfile file
# -t: the builded image
# . : current directory, here it's ~/tmp, then output path is ~/tmp/apache-doris-3.0.5-bin-x64
docker build \
--build-arg OUTPUT_PATH=./apache-doris-3.0.5-bin-x64 \
-f ~/workspace/doris/docker/runtime/doris-compose/Dockerfile \
-t my-doris:v3.0.5 \
.
```
### 3. Install the dependent python library in 'docker/runtime/doris-compose/requirements.txt'
`PyYAML` of certain version not always fit other libraries' requirements. So we suggest to use a individual environment using `venv` or `conda`.
```shell
python -m pip install --user -r docker/runtime/doris-compose/requirements.txt
```
if it failed, change content of `requirements.txt` to:
```Dockerfile
pyyaml==5.3.1
docker==6.1.3
......
```
## Usage
### Notice
Each cluster will have a directory in '/tmp/doris/{cluster-name}', user can set env `LOCAL_DORIS_PATH` to change its directory.
For example, if user export `LOCAL_DORIS_PATH=/mydoris`, then the cluster's directory is '/mydoris/{cluster-name}'.
And cluster's directory will contains all its containers's logs and data, like `fe-1`, `fe-2`, `be-1`, ..., etc.
If there are multiple users run doris-compose on the same machine, suggest don't change `LOCAL_DORIS_PATH` or they should export the same `LOCAL_DORIS_PATH`.
Because when create a new cluster, doris-compose will search the local doris path, and choose a docker network which is different with this path's clusters.
So if multiple users use different `LOCAL_DORIS_PATH`, their clusters may have docker network conflict!!!
### Create a cluster or recreate its containers
```shell
python docker/runtime/doris-compose/doris-compose.py up <cluster-name> <image?>
--add-fe-num <add-fe-num> --add-be-num <add-be-num>
[--fe-id <fd-id> --be-id <be-id>]
...
[ --cloud ]
[ --cluster-snapshot <cluster-snapshot-json> ]
```
if it's a new cluster, must specific the image.
add fe/be nodes with the specific image, or update existing nodes with `--fe-id`, `--be-id`
The `--cluster-snapshot` parameter allows you to provide a cluster snapshot JSON content for FE-1 first startup in cloud mode only. The JSON will be written to FE conf/cluster_snapshot.json and passed to start_fe.sh with --cluster_snapshot parameter. This is only effective on first startup.
Example:
```shell
python docker/runtime/doris-compose/doris-compose.py up my-cluster my-image --cloud --cluster-snapshot '{"instance_id":"instance_id_xxx"}'
```
For create a cloud cluster, steps are as below:
1. Write cloud s3 store config file, its default path is '/tmp/doris/cloud.ini'.
It's defined in environment variable `DORIS_CLOUD_CFG_FILE`, user can change this env var to change its path.
A Example file is locate in 'docker/runtime/doris-compose/resource/cloud.ini.example'.
2. Use doris compose up command with option `--cloud` to create a new cloud cluster.
The simplest way to create a cloud cluster:
```shell
python docker/runtime/doris-compose/doris-compose.py up <cluster-name> <image> --cloud
```
To create a cloud cluster with a custom cluster snapshot:
```shell
python docker/runtime/doris-compose/doris-compose.py up <cluster-name> <image> --cloud --cluster-snapshot '{"instance_id":"instance_id_xxx"}'
```
It will create 1 fdb, 1 meta service server, 1 recycler, 3 fe and 3 be.
### Remove node from the cluster
```shell
python docker/runtime/doris-compose/doris-compose.py down <cluster-name> --fe-id <fe-id> --be-id<be-id> [--clean] [--drop-force]
```
Down the containers and remove it from the DB.
For BE, if specific drop force, it will send dropp sql to FE, otherwise it will send decommission sql to FE.
If specific `--clean`, it will delete its data too.
### Start, stop, restart specific nodes
```shell
python docker/runtime/doris-compose/doris-compose.py start <cluster-name> --fe-id <multiple fe ids> --be-id <multiple be ids>
python docker/runtime/doris-compose/doris-compose.py restart <cluster-name> --fe-id <multiple fe ids> --be-id <multiple be ids>
```
### List doris cluster
```shell
python docker/runtime/doris-compose/doris-compose.py ls <multiple cluster names>
```
if specific cluster names, it will list all the cluster's nodes.
Otherwise it will just list summary of each clusters.
There are more options about doris-compose. Just try
```shell
python docker/runtime/doris-compose/doris-compose.py <command> -h
```
### Docker suite in regression test
Regression test support running a suite in a docker doris cluster.
See the example [demo_p0/docker_action.groovy](https://github.com/apache/doris/blob/master/regression-test/suites/demo_p0/docker_action.groovy).
The docker suite can specify fe num and be num, and add/drop/start/stop/restart the fe and be.
Before run a docker suite, read the annotation in `demo_p0/docker_action.groovy` carefully.
### Generate regression custom conf file
provide a command for let the regression test connect to a docker cluster.
```shell
python docker/runtime/doris-compose/doris-compose.py config <cluster-name> <doris-root-path> [-q] [--connect-follow-fe]
```
Generate regression-conf-custom.groovy to connect to the specific docker cluster.
### Setup cloud multi clusters test env
steps:
1. Create a new cluster: `python docker/runtime/doris-compose/doris-compose.py up my-cluster my-image --add-fe-num 2 --add-be-num 4 --cloud`
2. Generate regression-conf-custom.groovy: `python docker/runtime/doris-compose/doris-compose.py config my-cluster <doris-root-path> --connect-follow-fe`
3. Run regression test: `bash run-regression-test.sh --run -times 1 -parallel 1 -suiteParallel 1 -d cloud/multi_cluster`
### Multi cloud cluster with shared Meta Service
Doris compose now supports creating multiple cloud clusters that share the same Meta Service (MS), FDB, and Recycler services. This is useful for testing cross-cluster operations (such as cloning, backup/restore) under the same Meta Service instance.
#### Create the first cluster
First, create a complete cloud cluster that will provide MS/FDB/Recycler services:
```shell
python docker/runtime/doris-compose/doris-compose.py up cluster1 <image> --cloud --add-fe-num 1 --add-be-num 3
```
This creates the first cluster with:
- 1 FDB node
- 1 Meta Service (MS) node
- 1 Recycler node
- 1 FE node
- 3 BE nodes
#### Create additional clusters sharing the same MS
Now you can create additional sql/compute clusters that share the first cluster's Meta Service:
```shell
# Create second cluster sharing cluster1's MS
python docker/runtime/doris-compose/doris-compose.py up cluster2 <image> --cloud --external-ms cluster1 --instance-id instance_cluster2 --add-fe-num 1 --add-be-num 3
# Create third cluster sharing cluster1's MS
python docker/runtime/doris-compose/doris-compose.py up cluster3 <image> --cloud --external-ms cluster1 --instance-id instance_cluster3 --add-fe-num 1 --add-be-num 3
```
Key points:
- `--external-ms cluster1`: Specifies that this cluster will use cluster1's MS/FDB/Recycler services
- `--instance-id`: Must be unique for each cluster. If not specified, will auto-generate as `instance_<cluster-name>`
- The new clusters will NOT create their own MS/FDB/Recycler nodes, saving resources
- All clusters share the same object storage and meta service infrastructure
- Each cluster maintains its own FE/BE nodes for compute isolation
#### Network architecture
When using external MS:
- Each cluster has its own Docker network
- Compute clusters join the external MS cluster's network as well
- DNS resolution is configured automatically for all MS/FDB/Recycler nodes
- BE and FE nodes can communicate with MS nodes using their container names
#### Validation
Doris compose automatically validates:
1. External MS cluster exists
2. External cluster is a cloud cluster
3. MS and FDB nodes are present
4. MS and FDB containers are running
If validation fails, you'll get a clear error message explaining what needs to be fixed.
### Rollback Cloud Cluster to Snapshot
The rollback command allows you to rollback a cloud cluster to a specific snapshot state.
#### Basic Usage
```shell
python docker/runtime/doris-compose/doris-compose.py rollback <cluster-name> \
--cluster-snapshot '{"instance_id":"instance_xxx", ...}' \
[--instance-id NEW_INSTANCE_ID]
```
#### What it does
The rollback command performs the following operations on **ALL FE/BE nodes**:
1. **Stops** all FE and BE nodes
2. **Cleans** FE `doris-meta/` and BE `storage/` directories (preserves `conf/`, `log/`, etc.)
3. **Updates** update all nodes conf
4. **Restarts** all nodes with new `instance_id` and `cluster_snapshot`
#### Parameters
- `--cluster-snapshot` (required): Cluster snapshot JSON content
- Example: `'{"instance_id":"instance_id_xxx"}'`
- Will be written to FE-1's `conf/cluster_snapshot.json`
- `--instance-id` (optional): New instance ID after rollback
- If not specified, auto-generates: `instance_{cluster_name}_{timestamp}`
- `--wait-timeout` (optional): Wait seconds for nodes to be ready (default: 0)
#### Examples
**Full cluster rollback:**
```shell
python docker/runtime/doris-compose/doris-compose.py rollback my_cluster \
--cluster-snapshot '{"instance_id":"backup_instance", ...}' \
--wait-timeout 60
```
**Rollback with custom instance ID:**
```shell
python docker/runtime/doris-compose/doris-compose.py rollback my_cluster \
--cluster-snapshot '{"instance_id":"rollback_instance", ...}' \
--instance-id "prod_rollback_20251027"
```
## Problem investigation
### Log
Each cluster has logs in Docker in '/tmp/doris/{cluster-name}/{node-xxx}/log/'. For each node, doris compose will also print log in '/tmp/doris/{cluster-name}/{node-xxx}/log/health.out'
### Core Dump
Doris Compose supports core dump generation for debugging purposes. When a process crashes, it will generate a core dump file that can be analyzed with tools like gdb.
#### Core Dump Location
Core dump files are generated in the following locations:
- **Host System**: `/tmp/doris/{cluster-name}/{node-xxx}/core_dump/`
- **Container**: `/opt/apache-doris/core_dump/`
The core dump files follow the pattern: `core.{executable}.{pid}.{timestamp}`
For example:
```
/tmp/doris/my-cluster/be-1/core_dump/core.doris_be.12345.1755418335
```
#### Core Pattern Configuration
The system uses the core pattern from `/proc/sys/kernel/core_pattern` on the host system. The default pattern is:
```
/opt/apache-doris/core_dump/core.%e.%p.%t
```
Where:
- `%e`: executable name
- `%p`: process ID
- `%t`: timestamp
#### Core Dump Settings
Doris Compose automatically configures the following settings for core dump generation:
1. **Container Settings**:
- `ulimits.core = -1` (unlimited core file size)
- `cap_add: ["SYS_ADMIN"]` (required capabilities)
- `privileged: true` (privileged mode)
2. **Directory Permissions**:
- Core dump directory is created with 777 permissions
- Ownership is set to the host user for non-root containers
3. **Non-Root User Support**:
- Core dump directory permissions are automatically configured
- Works with both root and non-root user containers
#### Troubleshooting
If core dumps are not being generated:
1. **Check ulimit settings**:
```bash
ulimit -c
# Should return "unlimited" or a positive number
```
2. **Check directory permissions**:
```bash
ls -la /tmp/doris/{cluster-name}/{node-xxx}/core_dump/
# Should show 777 permissions
```
3. **Check core pattern**:
```bash
cat /proc/sys/kernel/core_pattern
# Should show the expected pattern
```
4. **Check container logs**:
```bash
docker logs {container-name}
# Look for core dump related messages
```
### Up cluster using non-detach mode
```shell
python docker/runtime/doris-compose/doris-compose.py up ... -no-detach
```
## Developer
Before submitting code, pls format code.
```shell
bash format-code.sh
```