layout: doc_page title: “S3-compatible”

S3-compatible

Make sure to include druid-s3-extensions as an extension.

Deep Storage

S3-compatible deep storage means either AWS S3 or a compatible service like Google Storage which exposes the same API as S3.

Configuration

The AWS SDK requires that the target region be specified. Two ways of doing this are by using the JVM system property aws.region or the environment variable AWS_REGION.

As an example, to set the region to ‘us-east-1’ through system properties:

  • Add -Daws.region=us-east-1 to the jvm.config file for all Druid services.
  • Add -Daws.region=us-east-1 to druid.indexer.runner.javaOpts in middleManager/runtime.properties so that the property will be passed to peon (worker) processes.
PropertyDescriptionDefault
druid.s3.accessKeyS3 access key.Must be set.
druid.s3.secretKeyS3 secret key.Must be set.
druid.storage.bucketBucket to store in.Must be set.
druid.storage.baseKeyBase key prefix to use, i.e. what directory.Must be set.
druid.storage.sse.typeServer-side encryption type. Should be one of s3, kms, and custom. See the below Server-side encryption section for more details.None
druid.storage.sse.kms.keyIdAWS KMS key ID. Can be empty if druid.storage.sse.type is kms.None
druid.storage.sse.custom.base64EncodedKeyBase64-encoded key. Should be specified if druid.storage.sse.type is custom.None
druid.s3.disableChunkedEncodingDisables chunked encoding. See AWS document for details.false
druid.s3.enablePathStyleAccessEnables path style access. See AWS document for details.false
druid.s3.forceGlobalBucketAccessEnabledEnables global bucket access. See AWS document for details.false
druid.s3.endpoint.urlService endpoint either with or without the protocol.None
druid.s3.endpoint.signingRegionRegion to use for SigV4 signing of requests (e.g. us-west-1).None
druid.s3.proxy.hostProxy host to connect through.None
druid.s3.proxy.portPort on the proxy host to connect through.None
druid.s3.proxy.usernameUser name to use when connecting through a proxy.None
druid.s3.proxy.passwordPassword to use when connecting through a proxy.None

Server-side encryption

You can enable server-side encryption by setting druid.storage.sse.type to a supported type of server-side encryption. The current supported types are:

StaticS3Firehose

This firehose ingests events from a predefined list of S3 objects. This firehose is splittable and can be used by native parallel index tasks. Since each split represents an object in this firehose, each worker task of index_parallel will read an object.

Sample spec:

"firehose" : {
    "type" : "static-s3",
    "uris": ["s3://foo/bar/file.gz", "s3://bar/foo/file2.gz"]
}

This firehose provides caching and prefetching features. In IndexTask, a firehose can be read twice if intervals or shardSpecs are not specified, and, in this case, caching can be useful. Prefetching is preferred when direct scan of objects is slow.

propertydescriptiondefaultrequired?
typeThis should be static-s3.N/Ayes
urisJSON array of URIs where s3 files to be ingested are located.N/Auris or prefixes must be set
prefixesJSON array of URI prefixes for the locations of s3 files to be ingested.N/Auris or prefixes must be set
maxCacheCapacityBytesMaximum size of the cache space in bytes. 0 means disabling cache. Cached files are not removed until the ingestion task completes.1073741824no
maxFetchCapacityBytesMaximum size of the fetch space in bytes. 0 means disabling prefetch. Prefetched files are removed immediately once they are read.1073741824no
prefetchTriggerBytesThreshold to trigger prefetching s3 objects.maxFetchCapacityBytes / 2no
fetchTimeoutTimeout for fetching an s3 object.60000no
maxFetchRetryMaximum retry for fetching an s3 object.3no