Doris-Operator supports mounting PV (Persistent Volume) on pods of various Doris components.
PV is generally created by the kubernetes system administrator. Doris-Operator does not use PV directly when deploying Doris services. Instead, it declares a set of resources through PVC to apply for PV from the kubernetes cluster. When a PVC is created, Kubernetes will attempt to bind it to an available PV that meets the requirements. StorageClass shields administrators from the process of manually creating PVs. When there are no ready-made PVs that meet PVC requirements, PVs can be dynamically allocated based on StorageClass. PV provides a variety of storage types, mainly divided into two categories: network storage and local storage. Based on their respective principles and implementations, the two provide users with different performance and usage experiences. Users can choose according to their own containerized service types and their own needs.
If PVC is not configured during deployment, Doris-Operator uses the emptyDir mode by default to store metadata data files and run logs. When the pod is restarted, related data will be lost.
Recommended node directory type for persistent storage:
Doris-Operator outputs logs to the console and the specified directory at the same time. If the user's Kubernetes system has complete log collection capabilities, log information at the Doris INFO level (default) can be collected through console output. However, it is still recommended to configure PVC to persist log files, because in addition to INFO level logs, there are also logs such as fe.out, be.out, audit.log and garbage collection logs, which facilitates quick problem location and audit log backtracking.
ConfigMap is a resource object used to store configuration files in Kubernetes. It allows dynamically mounting configuration files and decouples configuration files from applications, making configuration management more flexible and maintainable. Like PVCs, ConfigMap can be referenced by Pods in order to use configuration data in the application.
Doris-Operator provides Kubernetes default StorageClass mode to support FE and BE data storage, where the storage path (mountPath) uses the default configuration in the image. If users need to specify the StorageClass themselves, they need to modify persistentVolumeClaimSpec.storageClassName in spec.feSpec.persistentVolumes, as shown below:
apiVersion: doris.selectdb.com/v1 kind: DorisCluster metadata: labels: app.kubernetes.io/name: doriscluster name: doriscluster-sample-storageclass1 spec: feSpec: replicas: 3 image: selectdb/doris.fe-ubuntu:2.0.2 limits: cpu: 8 memory: 16Gi requests: cpu: 8 memory: 16Gi persistentVolumes: - mountPath: /opt/apache-doris/fe/doris-meta name: storage0 persistentVolumeClaimSpec: # when use specific storageclass, the storageClassName should reConfig, example as annotation. storageClassName: ${your_storageclass} accessModes: - ReadWriteOnce resources: # notice: if the storage size less 5G, fe will not start normal. requests: storage: 100Gi - mountPath: /opt/apache-doris/fe/log name: storage1 persistentVolumeClaimSpec: # when use specific storageclass, the storageClassName should reConfig, example as annotation. storageClassName: ${your_storageclass} accessModes: - ReadWriteOnce resources: requests: storage: 100Gi beSpec: replicas: 3 image: selectdb/doris.be-ubuntu:2.0.2 limits: cpu: 8 memory: 16Gi requests: cpu: 8 memory: 16Gi persistentVolumes: - mountPath: /opt/apache-doris/be/storage name: storage2 persistentVolumeClaimSpec: # when use specific storageclass, the storageClassName should reConfig, example as annotation. storageClassName: ${your_storageclass} accessModes: - ReadWriteOnce resources: requests: storage: 100Gi - mountPath: /opt/apache-doris/be/log name: storage3 persistentVolumeClaimSpec: # when use specific storageclass, the storageClassName should reConfig, example as annotation. storageClassName: ${your_storageclass} accessModes: - ReadWriteOnce resources: requests: storage: 100Gi
Doris uses ConfigMap in Kubernetes to decouple configuration files and services. Before deploying doriscluster, you need to deploy the ConfigMap you want to use under the same namespace in advance. The following example shows that FE uses ConfigMap named fe-configmap and BE uses ConfigMap named be-configmap. Cluster related yaml:
ConfigMap sample for FE
apiVersion: v1 kind: ConfigMap metadata: name: fe-configmap labels: app.kubernetes.io/component: fe data: fe.conf: | CUR_DATE=`date +%Y%m%d-%H%M%S` # the output dir of stderr and stdout LOG_DIR = ${DORIS_HOME}/log JAVA_OPTS="-Djavax.security.auth.useSubjectCredsOnly=false -Xss4m -Xmx8192m -XX:+UseMembar -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0 -Xloggc:$DORIS_HOME/log/fe.gc.log.$CUR_DATE" # For jdk 9+, this JAVA_OPTS will be used as default JVM options JAVA_OPTS_FOR_JDK_9="-Djavax.security.auth.useSubjectCredsOnly=false -Xss4m -Xmx8192m -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7 -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0 -Xlog:gc*:$DORIS_HOME/log/fe.gc.log.$CUR_DATE:time" # INFO, WARN, ERROR, FATAL sys_log_level = INFO # NORMAL, BRIEF, ASYNC sys_log_mode = NORMAL # Default dirs to put jdbc drivers,default value is ${DORIS_HOME}/jdbc_drivers # jdbc_drivers_dir = ${DORIS_HOME}/jdbc_drivers http_port = 8030 rpc_port = 9020 query_port = 9030 edit_log_port = 9010 enable_fqdn_mode = true
Note that when using FE's ConfigMap, you must add enable_fqdn_mode = true to fe.conf. For specific reasons, please refer to document here
BE's ConfigMap sample
apiVersion: v1 kind: ConfigMap metadata: name: be-configmap labels: app.kubernetes.io/component: be data: be.conf: | CUR_DATE=`date +%Y%m%d-%H%M%S` PPROF_TMPDIR="$DORIS_HOME/log/" JAVA_OPTS="-Xmx1024m -DlogPath=$DORIS_HOME/log/jni.log -Xloggc:$DORIS_HOME/log/be.gc.log.$CUR_DATE -Djavax.security.auth.useSubjectCredsOnly=false -Dsun.java.command=DorisBE -XX:-CriticalJNINatives -DJDBC_MIN_POOL=1 -DJDBC_MAX_POOL=100 -DJDBC_MAX_IDLE_TIME=300000 -DJDBC_MAX_WAIT_TIME=5000" # For jdk 9+, this JAVA_OPTS will be used as default JVM options JAVA_OPTS_FOR_JDK_9="-Xmx1024m -DlogPath=$DORIS_HOME/log/jni.log -Xlog:gc:$DORIS_HOME/log/be.gc.log.$CUR_DATE -Djavax.security.auth.useSubjectCredsOnly=false -Dsun.java.command=DorisBE -XX:-CriticalJNINatives -DJDBC_MIN_POOL=1 -DJDBC_MAX_POOL=100 -DJDBC_MAX_IDLE_TIME=300000 -DJDBC_MAX_WAIT_TIME=5000" # since 1.2, the JAVA_HOME need to be set to run BE process. # JAVA_HOME=/path/to/jdk/ # https://github.com/apache/doris/blob/master/docs/zh-CN/community/developer-guide/debug-tool.md#jemalloc-heap-profile # https://jemalloc.net/jemalloc.3.html JEMALLOC_CONF="percpu_arena:percpu,background_thread:true,metadata_thp:auto,muzzy_decay_ms:15000,dirty_decay_ms:15000,oversize_threshold:0,lg_tcache_max:20,prof:false,lg_prof_interval:32,lg_prof_sample:19,prof_gdump:false,prof_accum:false,prof_leak:false,prof_final:false" JEMALLOC_PROF_PRFIX="" # INFO, WARNING, ERROR, FATAL sys_log_level = INFO # ports for admin, web, heartbeat service be_port = 9060 webserver_port = 8040 heartbeat_service_port = 9050 brpc_port = 8060
doriscluster deployment example using the above two ConfigMap:
apiVersion: doris.selectdb.com/v1 kind: DorisCluster metadata: labels: app.kubernetes.io/name: doriscluster name: doriscluster-sample-configmap spec: feSpec: replicas: 3 image: selectdb/doris.fe-ubuntu:2.0.2 limits: cpu: 8 memory: 16Gi requests: cpu: 8 memory: 16Gi configMapInfo: # use kubectl create configmap fe-configmap --from-file=fe.conf configMapName: fe-configmap resolveKey: fe.conf beSpec: replicas: 3 image: selectdb/doris.be-ubuntu:2.0.2 limits: cpu: 8 memory: 16Gi requests: cpu: 8 memory: 16Gi configMapInfo: # use kubectl create configmap be-configmap --from-file=be.conf configMapName: be-configmap resolveKey: be.conf brokerSpec: replicas: 3 image: selectdb/doris.broker-ubuntu:2.0.2 limits: cpu: 2 memory: 4Gi requests: cpu: 2 memory: 4Gi configMapInfo: # use kubectl create configmap broker-configmap --from-file=apache_hdfs_broker.conf configMapName: broker-configmap resolveKey: apache_hdfs_broker.conf
The resolveKey here is the name of the incoming configuration file (must be fe.conf, be.conf or apache_hdfs_broker.conf, the cn node is also be.conf) used to parse the incoming Doris cluster configuration file, doris-operator will parse the file to guide the customized deployment of doriscluster.
This paragraph is for reference. Containerized deployment solutions that configure other files need to be placed in the conf directory of the Doris node. For example, the common HDFS/Hive configuration file mapping of Data Lake Multi-catalog.
Here we take BE's ConfigMap and the core-site.xml file that needs to be added as an example:
apiVersion: v1 kind: ConfigMap metadata: name: be-configmap labels: app.kubernetes.io/component: be data: be.conf: | be_port = 9060 webserver_port = 8040 heartbeat_service_port = 9050 brpc_port = 8060 core-site.xml: | <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hadoop.security.authentication</name> <value>kerberos</value> </property> </configuration> ...
Note that the data structure in data is as follows: key-value pair mapping:
data: file_name_1: file_content_1 file_name_2: file_content_2 file_name_3: file_content_3
Doris' BE service supports multi-disk mounting, which can well solve the problem of mismatch between computing resources and storage resources in the server era. At the same time, using multiple disks can also greatly improve the storage efficiency of doris. On Kubernetes, Doris can also mount multiple disks to maximize storage efficiency. Using multiple disks on Kubernetes requires using configuration files. In order to achieve decoupling of service and configuration, doris uses ConfigMap as the bearer of configuration to dynamically mount configuration files for service use. The following is the doriscluster configuration in which the BE service uses ConfigMap to host the configuration file and mount two disks for BE use:
apiVersion: doris.selectdb.com/v1 kind: DorisCluster metadata: labels: app.kubernetes.io/name: doriscluster name: doriscluster-sample-storageclass1 spec: feSpec: replicas: 3 image: selectdb/doris.fe-ubuntu:2.0.2 limits: cpu: 8 memory: 16Gi requests: cpu: 8 memory: 16Gi persistentVolumes: - mountPath: /opt/apache-doris/fe/doris-meta name: storage0 persistentVolumeClaimSpec: # when use specific storageclass, the storageClassName should reConfig, example as annotation. #storageClassName: openebs-jiva-csi-default accessModes: - ReadWriteOnce resources: # notice: if the storage size less 5G, fe will not start normal. requests: storage: 100Gi - mountPath: /opt/apache-doris/fe/log name: storage1 persistentVolumeClaimSpec: # when use specific storageclass, the storageClassName should reConfig, example as annotation. #storageClassName: openebs-jiva-csi-default accessModes: - ReadWriteOnce resources: requests: storage: 100Gi beSpec: replicas: 3 image: selectdb/doris.be-ubuntu:2.0.2 limits: cpu: 8 memory: 16Gi requests: cpu: 8 memory: 16Gi configMapInfo: configMapName: be-configmap resolveKey: be.conf persistentVolumes: - mountPath: /opt/apache-doris/be/storage name: storage2 persistentVolumeClaimSpec: # when use specific storageclass, the storageClassName should reConfig, example as annotation. #storageClassName: openebs-jiva-csi-default accessModes: - ReadWriteOnce resources: requests: storage: 100Gi - mountPath: /opt/apache-doris/be/storage1 name: storage3 persistentVolumeClaimSpec: # when use specific storageclass, the storageClassName should reConfig, example as annotation. #storageClassName: openebs-jiva-csi-default accessModes: - ReadWriteOnce resources: requests: storage: 100Gi - mountPath: /opt/apache-doris/be/log name: storage4 persistentVolumeClaimSpec: # when use specific storageclass, the storageClassName should reConfig, example as annotation. #storageClassName: openebs-jiva-csi-default accessModes: - ReadWriteOnce resources: requests: storage: 100Gi
Compared with the default example, the configuration of configMapInfo is added, and a configuration of persistentVolumeClaimSpec is also added, persistentVolumeClaimSpec fully follows the definition format of the Kubernetes native resource PVC spec. In the example, configMapInfo identifies which ConfigMap under the same namespace and which key corresponding content will be used as the configuration file after BE is deployed, where the key must be be.conf. The following is an example of the above doriscluster ConfigMap that needs to be pre-deployed:
apiVersion: v1 kind: ConfigMap metadata: name: be-configmap labels: app.kubernetes.io/component: be data: be.conf: | CUR_DATE=`date +%Y%m%d-%H%M%S` PPROF_TMPDIR="$DORIS_HOME/log/" JAVA_OPTS="-Xmx1024m -DlogPath=$DORIS_HOME/log/jni.log -Xloggc:$DORIS_HOME/log/be.gc.log.$CUR_DATE -Djavax.security.auth.useSubjectCredsOnly=false -Dsun.java.command=DorisBE -XX:-CriticalJNINatives -DJDBC_MIN_POOL=1 -DJDBC_MAX_POOL=100 -DJDBC_MAX_IDLE_TIME=300000 -DJDBC_MAX_WAIT_TIME=5000" # For jdk 9+, this JAVA_OPTS will be used as default JVM options JAVA_OPTS_FOR_JDK_9="-Xmx1024m -DlogPath=$DORIS_HOME/log/jni.log -Xlog:gc:$DORIS_HOME/log/be.gc.log.$CUR_DATE -Djavax.security.auth.useSubjectCredsOnly=false -Dsun.java.command=DorisBE -XX:-CriticalJNINatives -DJDBC_MIN_POOL=1 -DJDBC_MAX_POOL=100 -DJDBC_MAX_IDLE_TIME=300000 -DJDBC_MAX_WAIT_TIME=5000" # since 1.2, the JAVA_HOME need to be set to run BE process. # JAVA_HOME=/path/to/jdk/ # https://github.com/apache/doris/blob/master/docs/zh-CN/community/developer-guide/debug-tool.md#jemalloc-heap-profile # https://jemalloc.net/jemalloc.3.html JEMALLOC_CONF="percpu_arena:percpu,background_thread:true,metadata_thp:auto,muzzy_decay_ms:15000,dirty_decay_ms:15000,oversize_threshold:0,lg_tcache_max:20,prof:false,lg_prof_interval:32,lg_prof_sample:19,prof_gdump:false,prof_accum:false,prof_leak:false,prof_final:false" JEMALLOC_PROF_PRFIX="" # INFO, WARNING, ERROR, FATAL sys_log_level = INFO # ports for admin, web, heartbeat service be_port = 9060 webserver_port = 8040 heartbeat_service_port = 9050 brpc_port = 8060 storage_root_path = /opt/apache-doris/be/storage,medium:ssd;/opt/apache-doris/be/storage1,medium:ssd
When using multiple disks, the path in the corresponding value of storage_root_path in ConfigMap should correspond to each mounting path of persistentVolume in doriscluster. storage_root_path For the corresponding writing rules, please refer to the document in the link. When using cloud disks, the media is uniformly SSD.