Introduction

The CouchbaseWriter supports writing documents to a Couchbase bucket though the Couchbase Java SDK. Note that CouchbaseWriter only supports writing to a single bucket as there should be only 1 CouchbaseEnvironment per JVM.

Record format

Couchbase writer currently support AVRO and JSON as data inputs. On both of them it requires the following structured schema:

Document fieldDescription
keyUnique key used to store the document on the bucket. For more info view Couchbase docs
data.dataObject or value containing the information associated with the key for this document
data.flagsCouchbase flags To store JSON on data.data use 0x02 << 24 for UTF-8 0x04 << 24 .

The following is a sample input record with JSON data

{
 "key": "myKey123",
 "data": {
    "data": {
        "field1": "field1Value",
        "field2": 123
    },
    "flags": 33554432
  }
}

or to store plain text:

{
 "key": "myKey123",
 "data": {
    "data": "singleValueData",
    "flags": 67108864
  }
}

If using AVRO, use the following schema:

{
  "type" : "record",
  "name" : "topLevelRecord",
  "fields" : [ {
    "name" : "key",
    "type" : "string"
  }, {
    "name" : "data",
    "type" : {
      "type" : "record",
      "name" : "data",
      "namespace" : "topLevelRecord",
      "fields" : [ {
        "name" : "data",
        "type" : [ "bytes", "null" ]
      }, {
        "name" : "flags",
        "type" : "int"
      } ]
    }
  } ]
}

Note that the key can be other than string if needed.

Configuration

General configuration values

Configuration KeyDefault ValueDescription
writer.couchbase.bucketOptionalName of the couchbase bucket. Change if using other than default bucket
writer.couchbase.default"default"Name of the default bucket if writer.couchbase.bucket is not provided
writer.couchbase.dnsSrvEnabled"false"Enable DNS SRV bootstrapping docs
`writer.couchbase.bootstrapServerslocalhostURL to bootstrap servers. If using DNS SRV set writer.couchbase.dnsSrvEnabled to true
writer.couchbase.sslEnabledfalseUse SSL to connect to couchbase
writer.couchbase.passwordOptionalBucket password. Will be ignored if writer.couchbase.certAuthEnabled is true
writer.couchbase.certAuthEnabledfalseSet to true if using certificate authentication. Must also specify writer.couchbase.sslKeystoreFile, writer.couchbase.sslKeystorePassword, writer.couchbase.sslTruststoreFile, and writer.couchbase.sslTruststorePassword
writer.couchbase.sslKeystoreFileOptionalPath to the keystore file location
writer.couchbase.sslKeystorePasswordOptionalKeystore password
writer.couchbase.sslTruststoreFileOptionalPath to the trustStore file location
writer.couchbase.sslTruststorePasswordOptionalTrustStore password
writer.couchbase.documentTTL0Time To Live of each document. Units are specified in writer.couchbase.documentTTLOriginField
writer.couchbase.documentTTLUnitsSECONDSUnit for writer.couchbase.documentTTL. Must be one of java.util.concurrent.TimeUnit. Case insensitive
writer.couchbase.documentTTLOriginFieldOptionalTime To Live of each document. Units are specified in writer.couchbase.documentTTLOriginField
writer.couchbase.documentTTLOriginUnitsMILLISECONDSUnit for writer.couchbase.documentTTL. Must be one of java.util.concurrent.TimeUnit. Case insensitive. As an example a writer.couchbase.documentTTLOriginField value of 1568240399000 and writer.couchbase.documentTTLOriginUnits value of MILLISECONDS timeunit would be Wed Sep 11 15:19:59 PDT 2019
writer.couchbase.retriesEnabledfalseEnable write retries on failures
writer.couchbase.maxRetries5Maximum number of retries
writer.couchbase.failureAllowancePercentage0.0The percentage of failures that you are willing to tolerate while writing to Couchbase. Gobblin will mark the workunit successful and move on if there are failures but not enough to trip the failure threshold. Only successfully acknowledged writes are counted as successful, all others are considered as failures. The default for the failureAllowancePercentage is set to 0.0. For example, if the value is set to 0.2 This means that as long as 80% of the data is acknowledged by Couchbase, Gobblin will move on. If you want higher guarantees, set this config value to a lower value. e.g. If you want 99% delivery guarantees, set this value to 0.01
operationTimeoutMillis10000Global timeout for couchbase communication operations

Authentication

No credentials

NOT RECOMMENDED FOR PRODUCTION.

Do not set writer.couchbase.certAuthEnabled nor writer.couchbase.password

Using certificates

Set writer.couchbase.certAuthEnabled to true and values for writer.couchbase.sslKeystoreFile, writer.couchbase.sslKeystorePassword, writer.couchbase.sslTruststoreFile, and writer.couchbase.sslTruststorePassword.

writer.couchbase.password setting will be ignored if writer.couchbase.certAuthEnabled is set

Using bucket password

Set writer.couchbase.password

Document level expiration

Couchbase writer allows to set expiration at the document level using the expiry property of the couchbase document. PLease note that current couchbase implementation using timestamps limits it to January 19, 2038 03:14:07 GM given the type of expiry is set to int. CouchbaseWriter only works with global timestamps and does not use relative expiration in seconds (<30 days) for simplicity. Currently three modes are supported:

1 - Expiration from write time

Define only writer.couchbase.documentTTL and writer.couchbase.documentTTLUnits. For example for a 2 days expiration configs would look like:

Configuration KeyValue
writer.couchbase.documentTTL2
writer.couchbase.documentTTLUnitsDAYS

2 - Expiration from an origin timestamp

Define only writer.couchbase.documentTTL and writer.couchbase.documentTTLUnits.

For example for a 2 days expiration configs using the header.time field that has timestamp in MILLISECONDS would look like:

Configuration KeyValue
writer.couchbase.documentTTL2
writer.couchbase.documentTTLUnits"DAYS"
writer.couchbase.documentTTLOriginField"header.time"
writer.couchbase.documentTTLOriginUnits1568240399000

So a sample document with origin on 1568240399 (Wed Sep 11 15:19:59 PDT 2019) would expire on 1568413199 (Fri Sep 13 15:19:59 PDT 2019). The following is a sample record format.

{
 "key": "sampleKey",
 "data": {
    "data": {
        "field1": "field1Value",
        "header": {
            "time": 1568240399000
        }
    },
    "flags": 33554432
  }
}

}