tree: 7e05f133491d6404b4b184cf4d4de11cb9bf8684 [path history] [tgz]
  1. src/
  2. build.gradle
  3. README.md
sdks/java/io/cdap/README.md

CdapIO

CdapIO provides I/O transforms for CDAP plugins.

What is CDAP?

CDAP is an application platform for building and managing data applications in hybrid and multi-cloud environments. It enables developers, business analysts, and data scientists to use a visual rapid development environment and utilize common patterns, data, and application abstractions to accelerate the development of data applications, addressing a broader range of real-time and batch use cases.

CDAP plugins types:

  • Batch source
  • Batch sink
  • Streaming source

To learn more about CDAP plugins please see io.cdap.cdap.api.annotation.Plugin and Data Integrations plugins repository.

CDAP Batch plugins support in CDAP IO

CdapIO supports CDAP Batch plugins based on Hadoop InputFormat and OutputFormat. CDAP batch plugins support is implemented using HadoopFormatIO.

CdapIO currently supports the following CDAP Batch plugins by referencing CDAP plugin class:

It means that all these plugins can be used like this: CdapIO.withCdapPluginClass(HubspotBatchSource.class)

Requirements for Cdap Batch plugins

CDAP Batch plugin should be based on HadoopFormat implementation.

How to add support for a new CDAP Batch plugin

To add CdapIO support for a new CDAP Batch Plugin perform the following steps:

  1. Find CDAP plugin artifacts in the Maven Central repository. Example: Hubspot plugin Maven repository. Note: To add a custom CDAP plugin, please follow Sonatype publishing guidelines.
  2. Add the CDAP plugin Maven dependency to the build.gradle file. Example: implementation "io.cdap:hubspot-plugins:1.0.0".
  3. Here are two ways of using CDAP batch plugin with CdapIO:
    1. Using Plugin.createBatch() method. Pass Cdap Plugin class and correct InputFormat (or OutputFormat) and InputFormatProvider (or OutputFormatProvider) classes to CdapIO. Example:
    CdapIO.withCdapPlugin(
       Plugin.createBatch(
       EmployeeBatchSource.class,
       EmployeeInputFormat.class,
       EmployeeInputFormatProvider.class));
    
    1. Using MappingUtils.
      1. Navigate to MappingUtils class.
      2. Modify getPluginClassByName() method:
      3. Add the code for mapping Cdap Plugin class name and Input/Output Format and FormatProvider classes. Example:
      if (pluginClass.equals(EmployeeBatchSource.class)){
         return Plugin.createBatch(pluginClass,
                       EmployeeInputFormat.class,
                       EmployeeInputFormatProvider.class);
      }
      
      1. After these steps you will be able to use Cdap Plugin by class name like this: CdapIO.withCdapPluginClass(EmployeeBatchSource.class)

To learn more, please check out complete examples.

CDAP Streaming plugins support in CDAP IO

CdapIO supports CDAP Streaming plugins based on Apache Spark Receiver. CDAP streaming plugins support is implemented using SparkReceiverIO.

Requirements for Cdap Streaming plugins

  1. CDAP Streaming plugin should be based on Spark Receiver.
  2. CDAP Streaming plugin should support work with offsets.
    1. Corresponding Spark Receiver should implement HasOffset interface.
    2. Records should have the numeric field that represents record offset. Example: RecordId field for Salesforce and vid field for Hubspot plugins. For more details please see GetOffsetUtils class from examples.

How to add support for a new CDAP Streaming plugin

To add CdapIO support for a new CDAP Streaming SparkReceiver Plugin, perform the following steps:

  1. Find CDAP plugin artifacts in the Maven Central repository. Example: Hubspot plugin Maven repository. Note: To add a custom CDAP plugin, please follow Sonatype publishing guidelines.
  2. Add CDAP plugin Maven dependency to the build.gradle file. Example: implementation "io.cdap:hubspot-plugins:1.0.0".
  3. Implement function that will define how to get Long offset from the record of the Cdap Plugin. Example: see GetOffsetUtils class from examples.
  4. Here are two ways of using Cdap streaming Plugin with CdapIO:
    1. Using Plugin.createStreaming() method. Pass Cdap Plugin class, correct getOffsetFn (from step 3) and Spark Receiver class to CdapIO. Example:
    CdapIO.withCdapPlugin(
       Plugin.createStreaming(
       HubspotStreamingSource.class,
       offsetFnForHubspot,
       HubspotReceiver.class)));
    
    1. Using MappingUtils.
      1. Navigate to MappingUtils class.
      2. Modify getPluginClassByName() method:
      3. Add the code for mapping Cdap Plugin class name, getOffsetFn function and Spark Receiver class. Example:
      if (pluginClass.equals(HubspotStreamingSource.class)){
         return Plugin.createStreaming(pluginClass,
                       getOffsetFnForHubpot(),
                       HubspotReceiverClass.class);
      }
      
      1. After these steps you will be able to use Cdap Plugin by class name like this: CdapIO.withCdapPluginClass(HubspotStreamingSource.class)

To learn more, please check out complete examples.

Dependencies

To use CdapIO please add a dependency on beam-sdks-java-io-cdap.

<dependency>
    <groupId>org.apache.beam</groupId>
    <artifactId>beam-sdks-java-io-cdap</artifactId>
    <version>...</version>
</dependency>

Documentation

The documentation and usage examples are maintained in JavaDoc for CdapIO.java.