layout: doc_page

Configuring Druid

This describes the common configuration shared by all Druid nodes. These configurations can be defined in the common.runtime.properties file.

JVM Configuration Best Practices

There are four JVM parameters that we set on all of our processes:

  1. -Duser.timezone=UTC This sets the default timezone of the JVM to UTC. We always set this and do not test with other default timezones, so local timezones might work, but they also might uncover weird and interesting bugs. To issue queries in a non-UTC timezone, see query granularities
  2. -Dfile.encoding=UTF-8 This is similar to timezone, we test assuming UTF-8. Local encodings might work, but they also might result in weird and interesting bugs.
  3. -Djava.io.tmpdir=<a path> Various parts of the system that interact with the file system do it via temporary files, and these files can get somewhat large. Many production systems are set up to have small (but fast) /tmp directories, which can be problematic with Druid so we recommend pointing the JVM’s tmp directory to something with a little more meat.
  4. -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager This allows log4j2 to handle logs for non-log4j2 components (like jetty) which use standard java logging.

Extensions

Many of Druid's external dependencies can be plugged in as modules. Extensions can be provided using the following configs:

PropertyDescriptionDefault
druid.extensions.directoryThe root extension directory where user can put extensions related files. Druid will load extensions stored under this directory.extensions (This is a relative path to Druid's working directory)
druid.extensions.hadoopDependenciesDirThe root hadoop dependencies directory where user can put hadoop related dependencies files. Druid will load the dependencies based on the hadoop coordinate specified in the hadoop index task.hadoop-dependencies (This is a relative path to Druid's working directory
druid.extensions.hadoopContainerDruidClasspathHadoop Indexing launches hadoop jobs and this configuration provides way to explicitly set the user classpath for the hadoop job. By default this is computed automatically by druid based on the druid process classpath and set of extensions. However, sometimes you might want to be explicit to resolve dependency conflicts between druid and hadoop.null
druid.extensions.loadListA JSON array of extensions to load from extension directories by Druid. If it is not specified, its value will be null and Druid will load all the extensions under druid.extensions.directory. If its value is empty list [], then no extensions will be loaded at all.null
druid.extensions.searchCurrentClassloaderThis is a boolean flag that determines if Druid will search the main classloader for extensions. It defaults to true but can be turned off if you have reason to not automatically add all modules on the classpath.true

Zookeeper

We recommend just setting the base ZK path and the ZK service host, but all ZK paths that Druid uses can be overwritten to absolute paths.

PropertyDescriptionDefault
druid.zk.paths.baseBase Zookeeper path./druid
druid.zk.service.hostThe ZooKeeper hosts to connect to. This is a REQUIRED property and therefore a host address must be supplied.none

Zookeeper Behavior

PropertyDescriptionDefault
druid.zk.service.sessionTimeoutMsZooKeeper session timeout, in milliseconds.30000
druid.zk.service.compressBoolean flag for whether or not created Znodes should be compressed.true
druid.zk.service.aclBoolean flag for whether or not to enable ACL security for ZooKeeper. If ACL is enabled, zNode creators will have all permissions.false

Path Configuration

Druid interacts with ZK through a set of standard path configurations. We recommend just setting the base ZK path, but all ZK paths that Druid uses can be overwritten to absolute paths.

PropertyDescriptionDefault
druid.zk.paths.baseBase Zookeeper path./druid
druid.zk.paths.propertiesPathZookeeper properties path.${druid.zk.paths.base}/properties
druid.zk.paths.announcementsPathDruid node announcement path.${druid.zk.paths.base}/announcements
druid.zk.paths.liveSegmentsPathCurrent path for where Druid nodes announce their segments.${druid.zk.paths.base}/segments
druid.zk.paths.loadQueuePathEntries here cause historical nodes to load and drop segments.${druid.zk.paths.base}/loadQueue
druid.zk.paths.coordinatorPathUsed by the coordinator for leader election.${druid.zk.paths.base}/coordinator
druid.zk.paths.servedSegmentsPath@Deprecated. Legacy path for where Druid nodes announce their segments.${druid.zk.paths.base}/servedSegments

The indexing service also uses its own set of paths. These configs can be included in the common configuration.

PropertyDescriptionDefault
druid.zk.paths.indexer.baseBase zookeeper path for${druid.zk.paths.base}/indexer
druid.zk.paths.indexer.announcementsPathMiddle managers announce themselves here.${druid.zk.paths.indexer.base}/announcements
druid.zk.paths.indexer.tasksPathUsed to assign tasks to middle managers.${druid.zk.paths.indexer.base}/tasks
druid.zk.paths.indexer.statusPathParent path for announcement of task statuses.${druid.zk.paths.indexer.base}/status
druid.zk.paths.indexer.leaderLatchPathUsed for Overlord leader election.${druid.zk.paths.indexer.base}/leaderLatchPath

If druid.zk.paths.base and druid.zk.paths.indexer.base are both set, and none of the other druid.zk.paths.* or druid.zk.paths.indexer.* values are set, then the other properties will be evaluated relative to their respective base. For example, if druid.zk.paths.base is set to /druid1 and druid.zk.paths.indexer.base is set to /druid2 then druid.zk.paths.announcementsPath will default to /druid1/announcements while druid.zk.paths.indexer.announcementsPath will default to /druid2/announcements.

The following path is used for service discovery. It is not affected by druid.zk.paths.base and must be specified separately.

PropertyDescriptionDefault
druid.discovery.curator.pathServices announce themselves under this ZooKeeper path./druid/discovery

Startup Logging

All nodes can log debugging information on startup.

PropertyDescriptionDefault
druid.startup.logging.logPropertiesLog all properties on startup (from common.runtime.properties, runtime.properties, and the JVM command line).false

Note that some sensitive information may be logged if these settings are enabled.

Request Logging

All nodes that can serve queries can also log the query requests they see.

PropertyDescriptionDefault
druid.request.logging.typeChoices: noop, file, emitter. How to log every query request.noop

Note that, you can enable sending all the HTTP requests to log by setting “io.druid.jetty.RequestLog” to DEBUG level. See Logging

File Request Logging

Daily request logs are stored on disk.

PropertyDescriptionDefault
druid.request.logging.dirHistorical, Realtime and Broker nodes maintain request logs of all of the requests they get (interacton is via POST, so normal request logs don’t generally capture information about the actual query), this specifies the directory to store the request logs innone

Emitter Request Logging

Every request is emitted to some external location.

PropertyDescriptionDefault
druid.request.logging.feedFeed name for requests.none

Enabling Metrics

Druid nodes periodically emit metrics and different metrics monitors can be included. Each node can overwrite the default list of monitors.

PropertyDescriptionDefault
druid.monitoring.emissionPeriodHow often metrics are emitted.PT1m
druid.monitoring.monitorsSets list of Druid monitors used by a node. See below for names and more information. For example, you can specify monitors for a Broker with druid.monitoring.monitors=["com.metamx.metrics.SysMonitor","com.metamx.metrics.JvmMonitor"].none (no monitors)

The following monitors are available:

NameDescription
io.druid.client.cache.CacheMonitorEmits metrics (to logs) about the segment results cache for Historical and Broker nodes. Reports typical cache statistics include hits, misses, rates, and size (bytes and number of entries), as well as timeouts and and errors.
com.metamx.metrics.SysMonitorThis uses the SIGAR library to report on various system activities and statuses. Make sure to add the sigar library jar to your classpath if using this monitor.
io.druid.server.metrics.HistoricalMetricsMonitorReports statistics on Historical nodes.
com.metamx.metrics.JvmMonitorReports JVM-related statistics.
io.druid.segment.realtime.RealtimeMetricsMonitorReports statistics on Realtime nodes.
io.druid.server.metrics.EventReceiverFirehoseMonitorReports how many events have been queued in the EventReceiverFirehose.

Emitting Metrics

The Druid servers emit various metrics and alerts via something we call an Emitter. There are three emitter implementations included with the code, a “noop” emitter, one that just logs to log4j (“logging”, which is used by default if no emitter is specified) and one that does POSTs of JSON events to a server (“http”). The properties for using the logging emitter are described below.

PropertyDescriptionDefault
druid.emitterSetting this value to “noop”, “logging”, or “http” will initialize one of the emitter modules. value “composing” can be used to initialize multiple emitter modules.noop

Logging Emitter Module

PropertyDescriptionDefault
druid.emitter.logging.loggerClassChoices: HttpPostEmitter, LoggingEmitter, NoopServiceEmitter, ServiceEmitter. The class used for logging.LoggingEmitter
druid.emitter.logging.logLevelChoices: debug, info, warn, error. The log level at which message are logged.info

Http Emitter Module

PropertyDescriptionDefault
druid.emitter.http.timeOutThe timeout for data reads.PT5M
druid.emitter.http.flushMillisHow often the internal message buffer is flushed (data is sent).60000
druid.emitter.http.flushCountHow many messages the internal message buffer can hold before flushing (sending).500
druid.emitter.http.recipientBaseUrlThe base URL to emit messages to. Druid will POST JSON to be consumed at the HTTP endpoint specified by this property.none

Composing Emitter Module

PropertyDescriptionDefault
druid.emitter.composing.emittersList of emitter modules to load e.g. [“logging”,“http”].[]

Graphite Emitter

To use graphite as emitter set druid.emitter=graphite. For configuration details please follow this link.

Metadata Storage

These properties specify the jdbc connection and other configuration around the metadata storage. The only processes that connect to the metadata storage with these properties are the Coordinator, Indexing service and Realtime Nodes.

PropertyDescriptionDefault
druid.metadata.storage.typeThe type of metadata storage to use. Choose from “mysql”, “postgresql”, or “derby”.derby
druid.metadata.storage.connector.connectURIThe jdbc uri for the database to connect tonone
druid.metadata.storage.connector.userThe username to connect with.none
druid.metadata.storage.connector.passwordThe password to connect with.none
druid.metadata.storage.connector.createTablesIf Druid requires a table and it doesn't exist, create it?true
druid.metadata.storage.tables.baseThe base name for tables.druid
druid.metadata.storage.tables.segmentsThe table to use to look for segments.druid_segments
druid.metadata.storage.tables.rulesThe table to use to look for segment load/drop rules.druid_rules
druid.metadata.storage.tables.configThe table to use to look for configs.druid_config
druid.metadata.storage.tables.tasksUsed by the indexing service to store tasks.druid_tasks
druid.metadata.storage.tables.taskLogUsed by the indexing service to store task logs.druid_taskLog
druid.metadata.storage.tables.taskLockUsed by the indexing service to store task locks.druid_taskLock
druid.metadata.storage.tables.supervisorsUsed by the indexing service to store supervisor configurations.druid_supervisors
druid.metadata.storage.tables.auditThe table to use for audit history of configuration changes e.g. Coordinator rules.druid_audit

Deep Storage

The configurations concern how to push and pull Segments from deep storage.

PropertyDescriptionDefault
druid.storage.typeChoices:local, noop, s3, hdfs, c*. The type of deep storage to use.local

Local Deep Storage

Local deep storage uses the local filesystem.

PropertyDescriptionDefault
druid.storage.storageDirectoryDirectory on disk to use as deep storage./tmp/druid/localStorage

Noop Deep Storage

This deep storage doesn't do anything. There are no configs.

S3 Deep Storage

This deep storage is used to interface with Amazon's S3.

PropertyDescriptionDefault
druid.s3.accessKeyThe access key to use to access S3.none
druid.s3.secretKeyThe secret key to use to access S3.none
druid.storage.bucketS3 bucket name.none
druid.storage.baseKeyS3 object key prefix for storage.none
druid.storage.disableAclBoolean flag for ACL.false
druid.storage.archiveBucketS3 bucket name for archiving when running the indexing-service archive task.none
druid.storage.archiveBaseKeyS3 object key prefix for archiving.none

HDFS Deep Storage

This deep storage is used to interface with HDFS.

PropertyDescriptionDefault
druid.storage.storageDirectoryHDFS directory to use as deep storage.none

Cassandra Deep Storage

This deep storage is used to interface with Cassandra.

PropertyDescriptionDefault
druid.storage.hostCassandra host.none
druid.storage.keyspaceCassandra key space.none

Caching

You can enable caching of results at the broker, historical, or realtime level using following configurations.

PropertyDescriptionDefault
druid.cache.typelocal, memcachedThe type of cache to use for queries.
`druid.(brokerhistoricalrealtime).cache.unCacheable`
`druid.(brokerhistoricalrealtime).cache.useCache`
`druid.(brokerhistoricalrealtime).cache.populateCache`

Local Cache

PropertyDescriptionDefault
druid.cache.sizeInBytesMaximum cache size in bytes. You must set this if you enabled populateCache/useCache, or else cache size of zero wouldn't really cache anything.0
druid.cache.initialSizeInitial size of the hashtable backing the cache.500000
druid.cache.logEvictionCountIf non-zero, log cache eviction every logEvictionCount items.0

Memcache

PropertyDescriptionDefault
druid.cache.expirationMemcached expiration time.2592000 (30 days)
druid.cache.timeoutMaximum time in milliseconds to wait for a response from Memcached.500
druid.cache.hostsCommand separated list of Memcached hosts <host:port>.none
druid.cache.maxObjectSizeMaximum object size in bytes for a Memcached object.52428800 (50 MB)
druid.cache.memcachedPrefixKey prefix for all keys in Memcached.druid

Indexing Service Discovery

This config is used to find the Indexing Service using Curator service discovery. Only required if you are actually running an indexing service.

PropertyDescriptionDefault
druid.selectors.indexing.serviceNameThe druid.service name of the indexing service Overlord node. To start the Overlord with a different name, set it with this property.druid/overlord

Coordinator Discovery

This config is used to find the Coordinator using Curator service discovery. This config is used by the realtime indexing nodes to get information about the segments loaded in the cluster.

PropertyDescriptionDefault
druid.selectors.coordinator.serviceNameThe druid.service name of the coordinator node. To start the Coordinator with a different name, set it with this property.druid/coordinator

Announcing Segments

You can configure how to announce and unannounce Znodes in ZooKeeper (using Curator). For normal operations you do not need to override any of these configs.

Batch Data Segment Announcer

In current Druid, multiple data segments may be announced under the same Znode.

PropertyDescriptionDefault
druid.announcer.segmentsPerNodeEach Znode contains info for up to this many segments.50
druid.announcer.maxBytesPerNodeMax byte size for Znode.524288
druid.announcer.skipDimensionsAndMetricsSkip Dimensions and Metrics list from segment announcements. NOTE: Enabling this will also remove the dimensions and metrics list from coordinator and broker endpoints.false
druid.announcer.skipLoadSpecSkip segment LoadSpec from segment announcements. NOTE: Enabling this will also remove the loadspec from coordinator and broker endpoints.false

JavaScript

Druid supports dynamic runtime extension through JavaScript functions. This functionality can be configured through the following properties.

PropertyDescriptionDefault
druid.javascript.disabledSet to “true” to disable JavaScript functionality. This affects the JavaScript parser, filter, extractionFn, aggregator, and post-aggregator.false