Table of Contents

Introduction

Writing to a http based sink is done by sending a http or restful request and handling the response. Given the endpoint uri, query parameters, and body, it is straightforward to construct a http request. The idea is to build a writer that writes a http record, which contains those elements of a request. The writer builds a http or rest request from multiple http records, sends the request with a client that knows the server, and handles the response.

Note

The old http write framework under AbstractHttpWriter and AbstractHttpWriterBuilder is deprecated (Deprecation date: 05/15/2018)! Use AsyncHttpWriter and AsyncHttpWriterBuilder instead

Constructs

HttpOperation

A http record is represented as a HttpOperation object. It has 4 fields.

Field NameDescriptionExample
keysOptional, a key/value map to interpolate the url template{"memberId": "123"}
queryParamsOptional, a map from query parameter to its value{"action": "update"}
headersOptional, a map from header key to ts value{"version": "2.0"}
bodyOptional, the request body in string or json string format"{\"email\": \"httpwrite@test.com\"}"

Given an url template, http://www.test.com/profiles/${memberId}, from job configuration, the resolved example request url with keys and queryParams information will be http://www.test.com/profiles/123?action=update.

AsyncRequestBuilder

An AsyncRequestBuilder builds an AsyncRequest from a collection of HttpOperation records. It could build one request per record or batch multiple records into a single request. A builder is also responsible for putting the headers and setting the body to the request.

HttpClient

A HttpClient sends a request and returns a response. If necessary, it should setup the connection to the server, for example, sending an authorization request to get access token. How authorization is done is per use case. Gobblin does not provide general support for authorization yet.

ResponseHandler

A ResponseHandler handles a response of a request. It returns a ResponseStatus object to the framework, which would resend the request if it's a SERVER_ERROR.

Build an asynchronous writer

AsyncHttpWriterBuilder is the base builder to build an asynchronous http writer. A specific writer can be created by providing the 3 major components: a HttpClient, a AsyncRequestBuilder, and a ResponseHandler.

Gobblin offers 2 implementations of async http writers. As long as your write requirement can be expressed as a HttpOperation through a Converter, the 2 implementations should work with configurations.

AvroHttpWriterBuilder

An AvroHttpWriterBuilder builds an AsyncHttpWriter on top of the apache httpcomponents framework, sending vanilla http request. The 3 major components are:

  • ApacheHttpClient. It uses CloseableHttpClient to send HttpUriRequest and receive CloseableHttpResponse
  • ApacheHttpRequestBuilder. It builds a ApacheHttpRequest, which is an AsyncRequest that wraps the HttpUriRequest, from one HttpOperation
  • ApacheHttpResponseHandler. It handles a HttpResponse

Configurations for the builder are:

ConfigurationDescriptionExample
gobblin.writer.http.urlTemplateRequired, the url template(schema and port included), together with keys and queryParams, to be resolved to request urlhttp://www.test.com/profiles/${memberId}
gobblin.writer.http.verbRequired, http verbsget, update, delete, etc
gobblin.writer.http.errorCodeWhitelistOptional, http error codes allowed to pass through404, 500, etc. No error code is allowed by default
gobblin.writer.http.maxAttemptsOptional, max number of attempts including initial sendDefault is 3
gobblin.writer.http.contentTypeOptional, content type of the request body"application/json", which is the default value

R2RestWriterBuilder

A R2RestWriterBuilder builds an AsyncHttpWriter on top of restli r2 framework, sending rest request. The 3 major components are:

  • R2Client. It uses a R2 Client to send RestRequest and receive RestResponse
  • R2RestRequestBuilder. It builds a R2Request, which is an AsyncRequest that wraps the RestRequest, from one HttpOperation
  • R2RestResponseHandler. It handles a RestResponse

R2RestWriterBuilder has d2 and ssl support. Configurations((d2.) part should be added in d2 mode) for the builder are:

ConfigurationDescriptionExample
gobblin.writer.http.urlTemplateRequired, the url template(schema and port included), together with keys and queryParams, to be resolved to request url. If the schema is d2, d2 is enabledhttp://www.test.com/profiles/${memberId}
gobblin.writer.http.verbRequired, rest(rest.li) verbsget, update, put, delete, etc
gobblin.writer.http.maxAttemptsOptional, max number of attempts including initial sendDefault is 3
gobblin.writer.http.errorCodeWhitelistOptional, http error codes allowed to pass through404, 500, etc. No error code is allowed by default
gobblin.writer.http.d2.zkHostsRequired for d2, the zookeeper address
gobblin.writer.http.(d2.)sslOptional, enable sslDefault is false
gobblin.writer.http.(d2.)keyStoreFilePathRequired for ssl/tmp/identity.p12
gobblin.writer.http.(d2.)keyStoreTypeRequired for sslPKCS12
gobblin.writer.http.(d2.)keyStorePasswordRequired for ssl
gobblin.writer.http.(d2.)trustStoreFilePathRequired for ssl
gobblin.writer.http.(d2.)trustStorePasswordRequired for ssl
gobblin.writer.http.protocolVersionOptional, protocol version of rest.li2.0.0, which is the default value

R2RestWriterBuilder isn't ingegrated with PasswordManager to process encrypted passwords yet. The task is tracked as https://issues.apache.org/jira/browse/GOBBLIN-487

Build a synchronous writer

The idea is to reuse an asynchronous writer to build its synchronous version. The technical difference between them is the size of outstanding writes. Set gobblin.writer.http.maxOutstandingWrites to be 1(default value is 1000) to make a synchronous writer