CdapIO provides I/O transforms for CDAP plugins.
CDAP is an application platform for building and managing data applications in hybrid and multi-cloud environments. It enables developers, business analysts, and data scientists to use a visual rapid development environment and utilize common patterns, data, and application abstractions to accelerate the development of data applications, addressing a broader range of real-time and batch use cases.
CDAP plugins types:
To learn more about CDAP plugins please see io.cdap.cdap.api.annotation.Plugin and Data Integrations plugins repository.
CdapIO supports CDAP Batch plugins based on Hadoop InputFormat and OutputFormat. CDAP batch plugins support is implemented using HadoopFormatIO.
CdapIO currently supports the following CDAP Batch plugins by referencing CDAP plugin
class:
It means that all these plugins can be used like this: CdapIO.withCdapPluginClass(HubspotBatchSource.class)
CDAP Batch plugin should be based on HadoopFormat
implementation.
To add CdapIO support for a new CDAP Batch Plugin perform the following steps:
build.gradle
file. Example: implementation "io.cdap:hubspot-plugins:1.0.0"
.Plugin.createBatch()
method. Pass Cdap Plugin class and correct InputFormat
(or OutputFormat
) and InputFormatProvider
(or OutputFormatProvider
) classes to CdapIO. Example:CdapIO.withCdapPlugin( Plugin.createBatch( EmployeeBatchSource.class, EmployeeInputFormat.class, EmployeeInputFormatProvider.class));
MappingUtils
.getPluginClassByName()
method:Input/Output Format
and FormatProvider
classes. Example:if (pluginClass.equals(EmployeeBatchSource.class)){ return Plugin.createBatch(pluginClass, EmployeeInputFormat.class, EmployeeInputFormatProvider.class); }
CdapIO.withCdapPluginClass(EmployeeBatchSource.class)
To learn more, please check out complete examples.
CdapIO supports CDAP Streaming plugins based on Apache Spark Receiver. CDAP streaming plugins support is implemented using SparkReceiverIO.
Spark Receiver
.RecordId
field for Salesforce and vid
field for Hubspot plugins. For more details please see GetOffsetUtils class from examples.To add CdapIO support for a new CDAP Streaming SparkReceiver Plugin, perform the following steps:
build.gradle
file. Example: implementation "io.cdap:hubspot-plugins:1.0.0"
.Long offset
from the record of the Cdap Plugin. Example: see GetOffsetUtils class from examples.Plugin.createStreaming()
method. Pass Cdap Plugin class, correct getOffsetFn
(from step 3) and Spark Receiver
class to CdapIO. Example:CdapIO.withCdapPlugin( Plugin.createStreaming( HubspotStreamingSource.class, offsetFnForHubspot, HubspotReceiver.class)));
MappingUtils
.getPluginClassByName()
method:getOffsetFn
function and Spark Receiver
class. Example:if (pluginClass.equals(HubspotStreamingSource.class)){ return Plugin.createStreaming(pluginClass, getOffsetFnForHubpot(), HubspotReceiverClass.class); }
CdapIO.withCdapPluginClass(HubspotStreamingSource.class)
To learn more, please check out complete examples.
To use CdapIO please add a dependency on beam-sdks-java-io-cdap
.
<dependency> <groupId>org.apache.beam</groupId> <artifactId>beam-sdks-java-io-cdap</artifactId> <version>...</version> </dependency>
The documentation and usage examples are maintained in JavaDoc for CdapIO.java.