Foreword

This document is a description of the dolphinscheduler configuration file, and the version is for dolphinscheduler-1.3.x.

Directory Structure

All configuration files of dolphinscheduler are currently in the [conf] directory.

For a more intuitive understanding of the location of the [conf] directory and the configuration files it contains, please see the simplified description of the dolphinscheduler installation directory below.

This article mainly talks about the configuration file of dolphinscheduler. I won't go into details in other parts.

[Note: The following dolphinscheduler is referred to as DS.]


├─bin DS command storage directory │ ├─dolphinscheduler-daemon.sh Activate/deactivate DS service script │ ├─start-all.sh Start all DS services according to the configuration file │ ├─stop-all.sh Close all DS services according to the configuration file ├─conf Configuration file directory │ ├─application-api.properties api service configuration file │ ├─datasource.properties Database configuration file │ ├─zookeeper.properties zookeeper configuration file │ ├─master.properties Master service configuration file │ ├─worker.properties Worker service configuration file │ ├─quartz.properties Quartz service configuration file │ ├─common.properties Public service [storage] configuration file │ ├─alert.properties alert service configuration file │ ├─config Environment variable configuration folder │ ├─install_config.conf DS environment variable configuration script [for DS installation/startup] │ ├─env Run script environment variable configuration directory │ ├─dolphinscheduler_env.sh Run the script to load the environment variable configuration file [such as: JAVA_HOME, HADOOP_HOME, HIVE_HOME ...] │ ├─org mybatis mapper file directory │ ├─i18n i18n configuration file directory │ ├─logback-api.xml api service log configuration file │ ├─logback-master.xml Master service log configuration file │ ├─logback-worker.xml Worker service log configuration file │ ├─logback-alert.xml alert service log configuration file ├─sql DS metadata creation and upgrade sql file │ ├─create Create SQL script directory │ ├─upgrade Upgrade SQL script directory │ ├─dolphinscheduler-postgre.sql Postgre database initialization script │ ├─dolphinscheduler_mysql.sql mysql database initialization version │ ├─soft_version Current DS version identification file ├─script DS service deployment, database creation/upgrade script directory │ ├─create-dolphinscheduler.sh DS database initialization script │ ├─upgrade-dolphinscheduler.sh DS database upgrade script │ ├─monitor-server.sh DS service monitoring startup script │ ├─scp-hosts.sh Install file transfer script │ ├─remove-zk-node.sh Clean Zookeeper cache file script ├─ui Front-end WEB resource directory ├─lib DS dependent jar storage directory ├─install.sh Automatically install DS service script

Detailed configuration file

Serial numberService classificationConfiguration file
1Activate/deactivate DS service scriptdolphinscheduler-daemon.sh
2Database connection configurationdatasource.properties
3Zookeeper connection configurationzookeeper.properties
4Common [storage] configurationcommon.properties
5API service configurationapplication-api.properties
6Master service configurationmaster.properties
7Worker service configurationworker.properties
8Alert service configurationalert.properties
9Quartz configurationquartz.properties
10DS environment variable configuration script [for DS installation/startup]install_config.conf
11Run the script to load the environment variable configuration file
[for example: JAVA_HOME,HADOOP_HOME, HIVE_HOME ...
dolphinscheduler_env.sh
12Service log configuration filesapi service log configuration file : logback-api.xml
Master service log configuration file : logback-master.xml
Worker service log configuration file : logback-worker.xml
alertService log configuration file : logback-alert.xml

1.dolphinscheduler-daemon.sh [Activate/deactivate DS service script]

The dolphinscheduler-daemon.sh script is responsible for DS startup & shutdown start-all.sh/stop-all.sh eventually starts and shuts down the cluster through dolphinscheduler-daemon.sh. At present, DS has only made a basic setting. Please set the JVM parameters according to the actual situation of their resources.

The default simplified parameters are as follows:

export DOLPHINSCHEDULER_OPTS="
-server 
-Xmx16g 
-Xms1g 
-Xss512k 
-XX:+UseConcMarkSweepGC 
-XX:+CMSParallelRemarkEnabled 
-XX:+UseFastAccessorMethods 
-XX:+UseCMSInitiatingOccupancyOnly 
-XX:CMSInitiatingOccupancyFraction=70
"

It is not recommended to set “-XX:DisableExplicitGC”, DS uses Netty for communication. Setting this parameter may cause memory leaks.

2.datasource.properties [Database Connectivity]

Use Druid to manage the database connection in DS.The default simplified configuration is as follows. |Parameter | Defaults| Description| |--|--|--| spring.datasource.driver-class-name| |Database driver spring.datasource.url||Database connection address spring.datasource.username||Database username spring.datasource.password||Database password spring.datasource.initialSize|5| Number of initial connection pools spring.datasource.minIdle|5| Minimum number of connection pools spring.datasource.maxActive|5| Maximum number of connection pools spring.datasource.maxWait|60000| Maximum waiting time spring.datasource.timeBetweenEvictionRunsMillis|60000| Connection detection cycle spring.datasource.timeBetweenConnectErrorMillis|60000| Retry interval spring.datasource.minEvictableIdleTimeMillis|300000| The minimum time a connection remains idle without being evicted spring.datasource.validationQuery|SELECT 1|SQL to check whether the connection is valid spring.datasource.validationQueryTimeout|3| Timeout to check if the connection is valid[seconds] spring.datasource.testWhileIdle|true| Check when applying for connection, if idle time is greater than timeBetweenEvictionRunsMillis,Run validationQuery to check whether the connection is valid. spring.datasource.testOnBorrow|true| Execute validationQuery to check whether the connection is valid when applying for connection spring.datasource.testOnReturn|false| When returning the connection, execute validationQuery to check whether the connection is valid spring.datasource.defaultAutoCommit|true| Whether to enable automatic submission spring.datasource.keepAlive|true| For connections within the minIdle number in the connection pool, if the idle time exceeds minEvictableIdleTimeMillis, the keepAlive operation will be performed. spring.datasource.poolPreparedStatements|true| Open PSCache spring.datasource.maxPoolPreparedStatementPerConnectionSize|20| To enable PSCache, you must configure greater than 0, when greater than 0,PoolPreparedStatements automatically trigger modification to true.

3.zookeeper.properties [Zookeeper connection configuration]

ParameterDefaultsDescription
zookeeper.quorumlocalhost:2181zk cluster connection information
zookeeper.dolphinscheduler.root/dolphinschedulerDS stores root directory in zookeeper
zookeeper.session.timeout60000session time out
zookeeper.connection.timeout30000Connection timed out
zookeeper.retry.base.sleep100Basic retry time difference
zookeeper.retry.max.sleep30000Maximum retry time
zookeeper.retry.maxtime10Maximum number of retries

4.common.properties [hadoop, s3, yarn configuration]

The common.properties configuration file is currently mainly used to configure hadoop/s3a related configurations. |Parameter |Defaults| Description| |--|--|--| resource.storage.type|NONE|Resource file storage type: HDFS,S3,NONE resource.upload.path|/dolphinscheduler|Resource file storage path data.basedir.path|/tmp/dolphinscheduler|Local working directory for storing temporary files hadoop.security.authentication.startup.state|false|hadoop enable kerberos permission java.security.krb5.conf.path|/opt/krb5.conf|kerberos configuration directory login.user.keytab.username|hdfs-mycluster@ESZ.COM|kerberos login user login.user.keytab.path|/opt/hdfs.headless.keytab|kerberos login user keytab resource.view.suffixs|txt,log,sh,conf,cfg,py,java,sql,hql,xml,properties|File formats supported by the resource center hdfs.root.user|hdfs|If the storage type is HDFS, you need to configure users with corresponding operation permissions fs.defaultFS|hdfs://mycluster:8020|Request address if resource.storage.type=S3 ,the value is similar to: s3a://dolphinscheduler. If resource.storage.type=HDFS, If hadoop configured HA, you need to copy the core-site.xml and hdfs-site.xml files to the conf directory fs.s3a.endpoint||s3 endpoint address fs.s3a.access.key||s3 access key fs.s3a.secret.key||s3 secret key yarn.resourcemanager.ha.rm.ids||yarn resourcemanager address, If the resourcemanager has HA turned on, enter the IP address of the HA (separated by commas). If the resourcemanager is a single node, the value can be empty. yarn.application.status.address|http://ds1:8088/ws/v1/cluster/apps/%s|If resourcemanager has HA enabled or resourcemanager is not used, keep the default value. If resourcemanager is a single node, you need to configure ds1 as the hostname corresponding to resourcemanager dolphinscheduler.env.path|env/dolphinscheduler_env.sh|Run the script to load the environment variable configuration file [eg: JAVA_HOME, HADOOP_HOME, HIVE_HOME ...] development.state|false|Is it in development mode kerberos.expire.time|7|kerberos expire time,integer,the unit is day

5.application-api.properties [API service configuration]

ParameterDefaultsDescription
server.port12345API service communication port
server.servlet.session.timeout7200session timeout
server.servlet.context-path/dolphinschedulerRequest path
spring.servlet.multipart.max-file-size1024MBMaximum upload file size
spring.servlet.multipart.max-request-size1024MBMaximum request size
server.jetty.max-http-post-size5000000Jetty service maximum send request size
spring.messages.encodingUTF-8Request encoding
spring.jackson.time-zoneGMT+8Set time zone
spring.messages.basenamei18n/messagesi18n configuration
security.authentication.typePASSWORDPermission verification type

6.master.properties [Master service configuration]

ParameterDefaultsDescription
master.listen.port5678master listen port
master.exec.threads100master execute thread number to limit process instances in parallel
master.exec.task.num20master execute task number in parallel per process instance
master.dispatch.task.num3master dispatch task number per batch
master.host.selectorLowerWeightmaster host selector to select a suitable worker, default value: LowerWeight. Optional values include Random, RoundRobin, LowerWeight
master.heartbeat.interval10master heartbeat interval, the unit is second
master.task.commit.retryTimes5master commit task retry times
master.task.commit.interval1000master commit task interval, the unit is millisecond
master.max.cpuload.avg-1master max cpuload avg, only higher than the system cpu load average, master server can schedule. default value -1: the number of cpu cores * 2
master.reserved.memory0.3master reserved memory, only lower than system available memory, master server can schedule. default value 0.3, the unit is G

7.worker.properties [Worker service configuration]

ParameterDefaultsDescription
worker.listen.port1234worker listen port
worker.exec.threads100worker execute thread number to limit task instances in parallel
worker.heartbeat.interval10worker heartbeat interval, the unit is second
worker.max.cpuload.avg-1worker max cpuload avg, only higher than the system cpu load average, worker server can be dispatched tasks. default value -1: the number of cpu cores * 2
worker.reserved.memory0.3worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G
worker.groupsdefaultworker groups separated by comma, like ‘worker.groups=default,test’
worker will join corresponding group according to this config when startup

8.alert.properties [Alert alert service configuration]

ParameterDefaultsDescription
alert.typeEMAILAlarm type
mail.protocolSMTPMail server protocol
mail.server.hostxxx.xxx.comMail server address
mail.server.port25Mail server port
mail.senderxxx@xxx.comSender mailbox
mail.userxxx@xxx.comSender's email name
mail.passwd111111Sender email password
mail.smtp.starttls.enabletrueWhether the mailbox opens tls
mail.smtp.ssl.enablefalseWhether the mailbox opens ssl
mail.smtp.ssl.trustxxx.xxx.comEmail ssl whitelist
xls.file.path/tmp/xlsTemporary working directory for mailbox attachments
The following is the enterprise WeChat configuration[Optional]
enterprise.wechat.enablefalseWhether the enterprise WeChat is enabled
enterprise.wechat.corp.idxxxxxxx
enterprise.wechat.secretxxxxxxx
enterprise.wechat.agent.idxxxxxxx
enterprise.wechat.usersxxxxxxx
enterprise.wechat.token.urlhttps://qyapi.weixin.qq.com/cgi-bin/gettoken?
corpid=$corpId&corpsecret=$secret
enterprise.wechat.push.urlhttps://qyapi.weixin.qq.com/cgi-bin/message/send?
access_token=$token
enterprise.wechat.user.send.msgSend message format
enterprise.wechat.team.send.msgGroup message format
plugin.dir/Users/xx/your/path/to/plugin/dirPlugin directory

9.quartz.properties [Quartz configuration]

This is mainly quartz configuration, please configure it in combination with actual business scenarios & resources, this article will not be expanded for the time being. |Parameter |Defaults| Description| |--|--|--| org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.StdJDBCDelegate org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.PostgreSQLDelegate org.quartz.scheduler.instanceName | DolphinScheduler org.quartz.scheduler.instanceId | AUTO org.quartz.scheduler.makeSchedulerThreadDaemon | true org.quartz.jobStore.useProperties | false org.quartz.threadPool.class | org.quartz.simpl.SimpleThreadPool org.quartz.threadPool.makeThreadsDaemons | true org.quartz.threadPool.threadCount | 25 org.quartz.threadPool.threadPriority | 5 org.quartz.jobStore.class | org.quartz.impl.jdbcjobstore.JobStoreTX org.quartz.jobStore.tablePrefix | QRTZ_ org.quartz.jobStore.isClustered | true org.quartz.jobStore.misfireThreshold | 60000 org.quartz.jobStore.clusterCheckinInterval | 5000 org.quartz.jobStore.acquireTriggersWithinLock|true org.quartz.jobStore.dataSource | myDs org.quartz.dataSource.myDs.connectionProvider.class | org.apache.dolphinscheduler.service.quartz.DruidConnectionProvider

10.install_config.conf [DS environment variable configuration script [for DS installation/startup]]

The install_config.conf configuration file is more cumbersome.This file is mainly used in two places.

  • 1.Automatic installation of DS cluster.

Calling the install.sh script will automatically load the configuration in this file, and automatically configure the content in the above configuration file according to the content in this file. Such as::dolphinscheduler-daemon.sh、datasource.properties、zookeeper.properties、common.properties、application-api.properties、master.properties、worker.properties、alert.properties、quartz.properties Etc..

  • 2.DS cluster startup and shutdown.

When the DS cluster is started up and shut down, it will load the masters, workers, alertServer, apiServers and other parameters in the configuration file to start/close the DS cluster.

The contents of the file are as follows:


# Note: If the configuration file contains special characters,such as: `.*[]^${}\+?|()@#&`, Please escape, # Examples: `[` Escape to `\[` # Database type, currently only supports postgresql or mysql dbtype="mysql" # Database address & port dbhost="192.168.xx.xx:3306" # Database Name dbname="dolphinscheduler" # Database Username username="xx" # Database Password password="xx" # Zookeeper address zkQuorum="192.168.xx.xx:2181,192.168.xx.xx:2181,192.168.xx.xx:2181" # Where to install DS, such as: /data1_1T/dolphinscheduler, installPath="/data1_1T/dolphinscheduler" # Which user to use for deployment # Note: The deployment user needs sudo permissions and can operate hdfs. # If you use hdfs, the root directory must be created by the user. Otherwise, there will be permissions related issues. deployUser="dolphinscheduler" # The following is the alarm service configuration # Mail server address mailServerHost="smtp.exmail.qq.com" # Mail Server Port mailServerPort="25" # Sender mailSender="xxxxxxxxxx" # Sending user mailUser="xxxxxxxxxx" # email Password mailPassword="xxxxxxxxxx" # TLS protocol mailbox is set to true, otherwise set to false starttlsEnable="true" # The mailbox with SSL protocol enabled is set to true, otherwise it is false. Note: starttlsEnable and sslEnable cannot be true at the same time sslEnable="false" # Mail service address value, same as mailServerHost sslTrust="smtp.exmail.qq.com" #Where to upload resource files such as sql used for business, you can set: HDFS, S3, NONE. If you want to upload to HDFS, please configure as HDFS; if you do not need the resource upload function, please select NONE. resourceStorageType="NONE" # if S3,write S3 address,HA,for example :s3a://dolphinscheduler, # Note,s3 be sure to create the root directory /dolphinscheduler defaultFS="hdfs://mycluster:8020" # If the resourceStorageType is S3, the parameters to be configured are as follows: s3Endpoint="http://192.168.xx.xx:9010" s3AccessKey="xxxxxxxxxx" s3SecretKey="xxxxxxxxxx" # If the ResourceManager is HA, configure it as the primary and secondary ip or hostname of the ResourceManager node, such as "192.168.xx.xx, 192.168.xx.xx", otherwise if it is a single ResourceManager or yarn is not used at all, please configure yarnHaIps="" That’s it, if yarn is not used, configure it as "" yarnHaIps="192.168.xx.xx,192.168.xx.xx" # If it is a single ResourceManager, configure it as the ResourceManager node ip or host name, otherwise keep the default value. singleYarnIp="yarnIp1" # The storage path of resource files in HDFS/S3 resourceUploadPath="/dolphinscheduler" # HDFS/S3 Operating user hdfsRootUser="hdfs" # The following is the kerberos configuration # Whether kerberos is turned on kerberosStartUp="false" # kdc krb5 config file path krb5ConfPath="$installPath/conf/krb5.conf" # keytab username keytabUserName="hdfs-mycluster@ESZ.COM" # username keytab path keytabPath="$installPath/conf/hdfs.headless.keytab" # api service port apiServerPort="12345" # Hostname of all hosts where DS is deployed ips="ds1,ds2,ds3,ds4,ds5" # ssh port, default 22 sshPort="22" # Deploy master service host masters="ds1,ds2" # The host where the worker service is deployed # Note: Each worker needs to set a worker group name, the default value is "default" workers="ds1:default,ds2:default,ds3:default,ds4:default,ds5:default" # Deploy the alert service host alertServer="ds3" # Deploy api service host apiServers="ds1"

11.dolphinscheduler_env.sh [Environment variable configuration]

When submitting a task through a shell-like method, the environment variables in the configuration file are loaded into the host. The types of tasks involved are: Shell tasks, Python tasks, Spark tasks, Flink tasks, Datax tasks, etc.

export HADOOP_HOME=/opt/soft/hadoop
export HADOOP_CONF_DIR=/opt/soft/hadoop/etc/hadoop
export SPARK_HOME1=/opt/soft/spark1
export SPARK_HOME2=/opt/soft/spark2
export PYTHON_HOME=/opt/soft/python
export JAVA_HOME=/opt/soft/java
export HIVE_HOME=/opt/soft/hive
export FLINK_HOME=/opt/soft/flink
export DATAX_HOME=/opt/soft/datax/bin/datax.py

export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME:$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH:$FLINK_HOME/bin:$DATAX_HOME:$PATH

12.Service log configuration files

Correspondence serviceLog file name
api service log configuration filelogback-api.xml
Master service log configuration filelogback-master.xml
Worker service log configuration filelogback-worker.xml
alert service log configuration filelogback-alert.xml