layout: doc_page title: “Google Cloud Storage”

Google Cloud Storage

To use this extension, make sure to include druid-google-extensions extension.

Deep Storage

Deep storage can be written to Google Cloud Storage either via this extension or the druid-hdfs-storage extension.

Configuration

PropertyPossible ValuesDescriptionDefault
druid.storage.typegoogleMust be set.
druid.google.bucketGCS bucket name.Must be set.
druid.google.prefixGCS prefix.Must be set.

Firehose

StaticGoogleBlobStoreFirehose

This firehose ingests events, similar to the StaticS3Firehose, but from an Google Cloud Store.

As with the S3 blobstore, it is assumed to be gzipped if the extension ends in .gz

This firehose is splittable and can be used by native parallel index tasks. Since each split represents an object in this firehose, each worker task of index_parallel will read an object.

Sample spec:

"firehose" : {
    "type" : "static-google-blobstore",
    "blobs": [
        {
          "bucket": "foo",
          "path": "/path/to/your/file.json"
        },
        {
          "bucket": "bar",
          "path": "/another/path.json"
        }
    ]
}

This firehose provides caching and prefetching features. In IndexTask, a firehose can be read twice if intervals or shardSpecs are not specified, and, in this case, caching can be useful. Prefetching is preferred when direct scan of objects is slow.

propertydescriptiondefaultrequired?
typeThis should be static-google-blobstore.N/Ayes
blobsJSON array of Google Blobs.N/Ayes
maxCacheCapacityBytesMaximum size of the cache space in bytes. 0 means disabling cache. Cached files are not removed until the ingestion task completes.1073741824no
maxFetchCapacityBytesMaximum size of the fetch space in bytes. 0 means disabling prefetch. Prefetched files are removed immediately once they are read.1073741824no
prefetchTriggerBytesThreshold to trigger prefetching Google Blobs.maxFetchCapacityBytes / 2no
fetchTimeoutTimeout for fetching a Google Blob.60000no
maxFetchRetryMaximum retry for fetching a Google Blob.3no

Google Blobs:

propertydescriptiondefaultrequired?
bucketName of the Google Cloud bucketN/Ayes
pathThe path where data is located.N/Ayes