commit	9bcbfa2b0cfc553d0c42ccf1e17ce66424fb86cb	[log] [tgz]
author	YingDai <ydai1124@users.noreply.github.com>	Fri Sep 02 15:07:57 2016 -0700
committer	GitHub <noreply@github.com>	Fri Sep 02 15:07:57 2016 -0700
tree	0be5795d8f2d23609e52ec54b4c42d3406ff309f
parent	041fc719e7cf4ba8170ae1c4ad9fcd8e09c27afd [diff]

tree: 0be5795d8f2d23609e52ec54b4c42d3406ff309f

README.md

Gobblin

Gobblin is a universal data ingestion framework for extracting, transforming, and loading large volume of data from a variety of data sources, e.g., databases, rest APIs, FTP/SFTP servers, filers, etc., onto Hadoop. Gobblin handles the common routine tasks required for all data ingestion ETLs, including job/task scheduling, task partitioning, error handling, state management, data quality checking, data publishing, etc. Gobblin ingests data from different data sources in the same execution framework, and manages metadata of different sources all in one place. This, combined with other features such as auto scalability, fault tolerance, data quality assurance, extensibility, and the ability of handling data model evolution, makes Gobblin an easy-to-use, self-serving, and efficient data ingestion framework.

Quick Links

Documentation: Check out the Gobblin documentation for a complete description of Gobblin's features
Powered By: Check out the list of companies known to use Gobblin
Architecture: The Gobblin Architecture page has a full explanation of Gobblin's architecture
Getting Started with Gobblin: Refer to the Getting Started Guide on how to get started with Gobblin
Building Gobblin: Refer to the page Building Gobblin for directions on how to build Gobblin
Javadocs: The full JavaDocs for each released version of Gobblin can be found here