A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

Clone this repo:
  1. f26db67 [GOBBLIN-1356] Make hive registration in compaction also be able to pick up topic config in configStore by Zihan Li · 2 days ago master
  2. 8fdc7de [GOBBLIN-1361] Avoid conflicting ivysettings.xml in classpath by Lei Sun · 2 days ago
  3. 4e9326b [GOBBLIN-1342] Add API to resume a flow by Jack Moseley · 3 days ago
  4. d389be9 [GOBBLIN-1360] Provide option to specify minimum number of containers per topic in Gobblin Kafka[] by suvasude · 3 days ago
  5. 493c3c1 [GOBBLIN-1350] Removes git ignore for adding/modifying scripts by William Lo · 4 days ago

Apache Gobblin

Build Status Documentation Status Maven Central Stack Overflow Join us on Slack codecov.io

Apache Gobblin is a universal data ingestion framework for extracting, transforming, and loading large volume of data from a variety of data sources: databases, rest APIs, FTP/SFTP servers, filers, etc., onto Hadoop.

Apache Gobblin handles the common routine tasks required for all data ingestion ETLs, including job/task scheduling, task partitioning, error handling, state management, data quality checking, data publishing, etc.

Gobblin ingests data from different data sources in the same execution framework, and manages metadata of different sources all in one place. This, combined with other features such as auto scalability, fault tolerance, data quality assurance, extensibility, and the ability of handling data model evolution, makes Gobblin an easy-to-use, self-serving, and efficient data ingestion framework.

Requirements

  • Java >= 1.8

If building the distribution with tests turned on:

  • Maven version 3.5.3

Instructions to run Apache RAT (Release Audit Tool)

  1. Extract the archive file to your local directory.
  2. Download gradle-wrapper.jar (version 2.13) and place it in the gradle/wrapper folder. See ‘Instructions to download gradle wrapper’ above.
  3. Run ./gradlew rat. Report will be generated under build/rat/rat-report.html

Instructions to build the distribution

  1. Extract the archive file to your local directory.
  2. Download gradle-wrapper.jar (version 2.13) and place it in the gradle/wrapper folder. See ‘Instructions to download gradle wrapper’ above.
  3. Skip tests and build the distribution: Run ./gradlew build -x findbugsMain -x test -x rat -x checkstyleMain The distribution will be created in build/gobblin-distribution/distributions directory. (or)
  4. Run tests and build the distribution (requires Maven): Run ./gradlew build The distribution will be created in build/gobblin-distribution/distributions directory.

Quick Links