| --- |
| layout: doc_page |
| title: "S3-compatible" |
| --- |
| |
| <!-- |
| ~ Licensed to the Apache Software Foundation (ASF) under one |
| ~ or more contributor license agreements. See the NOTICE file |
| ~ distributed with this work for additional information |
| ~ regarding copyright ownership. The ASF licenses this file |
| ~ to you under the Apache License, Version 2.0 (the |
| ~ "License"); you may not use this file except in compliance |
| ~ with the License. You may obtain a copy of the License at |
| ~ |
| ~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~ |
| ~ Unless required by applicable law or agreed to in writing, |
| ~ software distributed under the License is distributed on an |
| ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| ~ KIND, either express or implied. See the License for the |
| ~ specific language governing permissions and limitations |
| ~ under the License. |
| --> |
| |
| # S3-compatible |
| |
| To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `druid-s3-extensions` as an extension. |
| |
| ## Deep Storage |
| |
| S3-compatible deep storage means either AWS S3 or a compatible service like Google Storage which exposes the same API as S3. |
| |
| ### Configuration |
| |
| The AWS SDK requires that the target region be specified. Two ways of doing this are by using the JVM system property `aws.region` or the environment variable `AWS_REGION`. |
| |
| As an example, to set the region to 'us-east-1' through system properties: |
| |
| - Add `-Daws.region=us-east-1` to the jvm.config file for all Druid services. |
| - Add `-Daws.region=us-east-1` to `druid.indexer.runner.javaOpts` in middleManager/runtime.properties so that the property will be passed to Peon (worker) processes. |
| |
| |Property|Description|Default| |
| |--------|-----------|-------| |
| |`druid.s3.accessKey`|S3 access key.|Must be set.| |
| |`druid.s3.secretKey`|S3 secret key.|Must be set.| |
| |`druid.storage.bucket`|Bucket to store in.|Must be set.| |
| |`druid.storage.baseKey`|Base key prefix to use, i.e. what directory.|Must be set.| |
| |`druid.storage.sse.type`|Server-side encryption type. Should be one of `s3`, `kms`, and `custom`. See the below [Server-side encryption section](#server-side-encryption) for more details.|None| |
| |`druid.storage.sse.kms.keyId`|AWS KMS key ID. Can be empty if `druid.storage.sse.type` is `kms`.|None| |
| |`druid.storage.sse.custom.base64EncodedKey`|Base64-encoded key. Should be specified if `druid.storage.sse.type` is `custom`.|None| |
| |`druid.s3.protocol`|Communication protocol type to use when sending requests to AWS. `http` or `https` can be used.|`https`| |
| |`druid.s3.disableChunkedEncoding`|Disables chunked encoding. See [AWS document](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Builder.html#disableChunkedEncoding--) for details.|false| |
| |`druid.s3.enablePathStyleAccess`|Enables path style access. See [AWS document](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Builder.html#enablePathStyleAccess--) for details.|false| |
| |`druid.s3.forceGlobalBucketAccessEnabled`|Enables global bucket access. See [AWS document](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Builder.html#setForceGlobalBucketAccessEnabled-java.lang.Boolean-) for details.|false| |
| |`druid.s3.endpoint.url`|Service endpoint either with or without the protocol.|None| |
| |`druid.s3.endpoint.signingRegion`|Region to use for SigV4 signing of requests (e.g. us-west-1).|None| |
| |`druid.s3.proxy.host`|Proxy host to connect through.|None| |
| |`druid.s3.proxy.port`|Port on the proxy host to connect through.|None| |
| |`druid.s3.proxy.username`|User name to use when connecting through a proxy.|None| |
| |`druid.s3.proxy.password`|Password to use when connecting through a proxy.|None| |
| |
| ## Server-side encryption |
| |
| You can enable [server-side encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html) by setting |
| `druid.storage.sse.type` to a supported type of server-side encryption. The current supported types are: |
| |
| - s3: [Server-side encryption with S3-managed encryption keys](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html) |
| - kms: [Server-side encryption with AWS KMS–Managed Keys](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingKMSEncryption.html) |
| - custom: [Server-side encryption with Customer-Provided Encryption Keys](https://docs.aws.amazon.com/AmazonS3/latest/dev/ServerSideEncryptionCustomerKeys.html) |
| |
| ## StaticS3Firehose |
| |
| This firehose ingests events from a predefined list of S3 objects. |
| This firehose is _splittable_ and can be used by [native parallel index tasks](../../ingestion/native_tasks.html#parallel-index-task). |
| Since each split represents an object in this firehose, each worker task of `index_parallel` will read an object. |
| |
| Sample spec: |
| |
| ```json |
| "firehose" : { |
| "type" : "static-s3", |
| "uris": ["s3://foo/bar/file.gz", "s3://bar/foo/file2.gz"] |
| } |
| ``` |
| |
| This firehose provides caching and prefetching features. In IndexTask, a firehose can be read twice if intervals or |
| shardSpecs are not specified, and, in this case, caching can be useful. Prefetching is preferred when direct scan of objects is slow. |
| |
| |property|description|default|required?| |
| |--------|-----------|-------|---------| |
| |type|This should be `static-s3`.|N/A|yes| |
| |uris|JSON array of URIs where s3 files to be ingested are located.|N/A|`uris` or `prefixes` must be set| |
| |prefixes|JSON array of URI prefixes for the locations of s3 files to be ingested.|N/A|`uris` or `prefixes` must be set| |
| |maxCacheCapacityBytes|Maximum size of the cache space in bytes. 0 means disabling cache. Cached files are not removed until the ingestion task completes.|1073741824|no| |
| |maxFetchCapacityBytes|Maximum size of the fetch space in bytes. 0 means disabling prefetch. Prefetched files are removed immediately once they are read.|1073741824|no| |
| |prefetchTriggerBytes|Threshold to trigger prefetching s3 objects.|maxFetchCapacityBytes / 2|no| |
| |fetchTimeout|Timeout for fetching an s3 object.|60000|no| |
| |maxFetchRetry|Maximum retry for fetching an s3 object.|3|no| |