| --- |
| id: s3 |
| title: "S3-compatible" |
| --- |
| |
| <!-- |
| ~ Licensed to the Apache Software Foundation (ASF) under one |
| ~ or more contributor license agreements. See the NOTICE file |
| ~ distributed with this work for additional information |
| ~ regarding copyright ownership. The ASF licenses this file |
| ~ to you under the Apache License, Version 2.0 (the |
| ~ "License"); you may not use this file except in compliance |
| ~ with the License. You may obtain a copy of the License at |
| ~ |
| ~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~ |
| ~ Unless required by applicable law or agreed to in writing, |
| ~ software distributed under the License is distributed on an |
| ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| ~ KIND, either express or implied. See the License for the |
| ~ specific language governing permissions and limitations |
| ~ under the License. |
| --> |
| |
| ## S3 extension |
| |
| This extension allows you to do 2 things: |
| * [Ingest data](#reading-data-from-s3) from files stored in S3. |
| * Write segments to [deep storage](#deep-storage) in S3. |
| |
| To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-s3-extensions` in the extensions load list. |
| |
| ### Reading data from S3 |
| |
| Use a native batch [Parallel task](../../ingestion/native-batch.md) with an [S3 input source](../../ingestion/input-sources.md#s3-input-source) to read objects directly from S3. |
| |
| Alternatively, use a [Hadoop task](../../ingestion/hadoop.md), |
| and specify S3 paths in your [`inputSpec`](../../ingestion/hadoop.md#inputspec). |
| |
| To read objects from S3, you must supply [connection information](#configuration) in configuration. |
| |
| ### Deep Storage |
| |
| S3-compatible deep storage means either AWS S3 or a compatible service like Google Storage which exposes the same API as S3. |
| |
| S3 deep storage needs to be explicitly enabled by setting `druid.storage.type=s3`. **Only after setting the storage type to S3 will any of the settings below take effect.** |
| |
| To use S3 for Deep Storage, you must supply [connection information](#configuration) in configuration *and* set additional configuration, specific for [Deep Storage](#deep-storage-specific-configuration). |
| |
| #### Deep storage specific configuration |
| |
| |Property|Description|Default| |
| |--------|-----------|-------| |
| |`druid.storage.bucket`|Bucket to store in.|Must be set.| |
| |`druid.storage.baseKey`|A prefix string that will be prepended to the object names for the segments published to S3 deep storage|Must be set.| |
| |`druid.storage.type`|Global deep storage provider. Must be set to `s3` to make use of this extension.|Must be set (likely `s3`).| |
| |`druid.storage.archiveBucket`|S3 bucket name for archiving when running the *archive task*.|none| |
| |`druid.storage.archiveBaseKey`|S3 object key prefix for archiving.|none| |
| |`druid.storage.disableAcl`|Boolean flag for how object permissions are handled. To use ACLs, set this property to `false`. To use Object Ownership, set it to `true`. The permission requirements for ACLs and Object Ownership are different. For more information, see [S3 permissions settings](#s3-permissions-settings).|false| |
| |`druid.storage.useS3aSchema`|If true, use the "s3a" filesystem when using Hadoop-based ingestion. If false, the "s3n" filesystem will be used. Only affects Hadoop-based ingestion.|false| |
| |
| ## Configuration |
| |
| ### S3 authentication methods |
| |
| You can provide credentials to connect to S3 in a number of ways, whether for [deep storage](#deep-storage) or as an [ingestion source](#reading-data-from-s3). |
| |
| The configuration options are listed in order of precedence. For example, if you would like to use profile information given in `~/.aws/credentials`, do not set `druid.s3.accessKey` and `druid.s3.secretKey` in your Druid config file because they would take precedence. |
| |
| |order|type|details| |
| |--------|-----------|-------| |
| |1|Druid config file|Based on your runtime.properties if it contains values `druid.s3.accessKey` and `druid.s3.secretKey` | |
| |2|Custom properties file| Based on custom properties file where you can supply `sessionToken`, `accessKey` and `secretKey` values. This file is provided to Druid through `druid.s3.fileSessionCredentials` properties| |
| |3|Environment variables|Based on environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`| |
| |4|Java system properties|Based on JVM properties `aws.accessKeyId` and `aws.secretKey` | |
| |5|Profile information|Based on credentials you may have on your druid instance (generally in `~/.aws/credentials`)| |
| |6|ECS container credentials|Based on environment variables available on AWS ECS (AWS_CONTAINER_CREDENTIALS_RELATIVE_URI or AWS_CONTAINER_CREDENTIALS_FULL_URI) as described in the [EC2ContainerCredentialsProviderWrapper documentation](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/EC2ContainerCredentialsProviderWrapper.html)| |
| |7|Instance profile information|Based on the instance profile you may have attached to your druid instance| |
| |
| For more information, refer to the [Amazon Developer Guide](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials). |
| |
| Alternatively, you can bypass this chain by specifying an access key and secret key using a [Properties Object](../../ingestion/input-sources.md#s3-input-source) inside your ingestion specification. |
| |
| Use the property [`druid.startup.logging.maskProperties`](../../configuration/index.md#startup-logging) to mask credentials information in Druid logs. For example, `["password", "secretKey", "awsSecretAccessKey"]`. |
| |
| ### S3 permissions settings |
| |
| To manage the permissions for objects in an S3 bucket, you can use either ACLs or Object Ownership. The permissions required for each method are different. |
| |
| By default, Druid uses ACLs. With ACLs, any object that Druid puts into the bucket inherits the ACL settings from the bucket. |
| |
| You can switch from using ACLs to Object Ownership by setting `druid.storage.disableAcl` to `true`. The bucket owner owns any object that gets created, so you need to use S3's bucket policies to manage permissions. |
| |
| Note that this setting only affects Druid's behavior. Changing S3 to use Object Ownership requires additional configuration. For more information, see the AWS documentation on [Controlling ownership of objects and disabling ACLs for your bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/about-object-ownership.html). |
| |
| #### ACL permissions |
| |
| If you're using ACLs, Druid needs the following permissions: |
| |
| - `s3:GetObject` |
| - `s3:PutObject` |
| - `s3:DeleteObject` |
| - `s3:GetBucketAcl` |
| - `s3:PutObjectAcl` |
| |
| #### Object Ownership permissions |
| |
| If you're using Object Ownership, Druid needs the following permissions: |
| |
| - `s3:GetObject` |
| - `s3:PutObject` |
| - `s3:DeleteObject` |
| |
| ### AWS region |
| |
| The AWS SDK requires that a target region be specified. You can set these by using the JVM system property `aws.region` or by setting an environment variable `AWS_REGION`. |
| |
| For example, to set the region to 'us-east-1' through system properties: |
| |
| - Add `-Daws.region=us-east-1` to the `jvm.config` file for all Druid services. |
| - Add `-Daws.region=us-east-1` to `druid.indexer.runner.javaOpts` in [Middle Manager configuration](../../configuration/index.md#middlemanager-configuration) so that the property will be passed to Peon (worker) processes. |
| |
| ### Connecting to S3 configuration |
| |
| |Property|Description|Default| |
| |--------|-----------|-------| |
| |`druid.s3.accessKey`|S3 access key. See [S3 authentication methods](#s3-authentication-methods) for more details|Can be omitted according to authentication methods chosen.| |
| |`druid.s3.secretKey`|S3 secret key. See [S3 authentication methods](#s3-authentication-methods) for more details|Can be omitted according to authentication methods chosen.| |
| |`druid.s3.fileSessionCredentials`|Path to properties file containing `sessionToken`, `accessKey` and `secretKey` value. One key/value pair per line (format `key=value`). See [S3 authentication methods](#s3-authentication-methods) for more details |Can be omitted according to authentication methods chosen.| |
| |`druid.s3.protocol`|Communication protocol type to use when sending requests to AWS. `http` or `https` can be used. This configuration would be ignored if `druid.s3.endpoint.url` is filled with a URL with a different protocol.|`https`| |
| |`druid.s3.disableChunkedEncoding`|Disables chunked encoding. See [AWS document](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Builder.html#disableChunkedEncoding--) for details.|false| |
| |`druid.s3.enablePathStyleAccess`|Enables path style access. See [AWS document](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Builder.html#enablePathStyleAccess--) for details.|false| |
| |`druid.s3.forceGlobalBucketAccessEnabled`|Enables global bucket access. See [AWS document](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Builder.html#setForceGlobalBucketAccessEnabled-java.lang.Boolean-) for details.|false| |
| |`druid.s3.endpoint.url`|Service endpoint either with or without the protocol.|None| |
| |`druid.s3.endpoint.signingRegion`|Region to use for SigV4 signing of requests (e.g. us-west-1).|None| |
| |`druid.s3.proxy.host`|Proxy host to connect through.|None| |
| |`druid.s3.proxy.port`|Port on the proxy host to connect through.|None| |
| |`druid.s3.proxy.username`|User name to use when connecting through a proxy.|None| |
| |`druid.s3.proxy.password`|Password to use when connecting through a proxy.|None| |
| |`druid.storage.sse.type`|Server-side encryption type. Should be one of `s3`, `kms`, and `custom`. See the below [Server-side encryption section](#server-side-encryption) for more details.|None| |
| |`druid.storage.sse.kms.keyId`|AWS KMS key ID. This is used only when `druid.storage.sse.type` is `kms` and can be empty to use the default key ID.|None| |
| |`druid.storage.sse.custom.base64EncodedKey`|Base64-encoded key. Should be specified if `druid.storage.sse.type` is `custom`.|None| |
| |
| ## Server-side encryption |
| |
| You can enable [server-side encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption) by setting |
| `druid.storage.sse.type` to a supported type of server-side encryption. The current supported types are: |
| |
| - s3: [Server-side encryption with S3-managed encryption keys](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption) |
| - kms: [Server-side encryption with AWS KMS–Managed Keys](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingKMSEncryption) |
| - custom: [Server-side encryption with Customer-Provided Encryption Keys](https://docs.aws.amazon.com/AmazonS3/latest/dev/ServerSideEncryptionCustomerKeys) |