blob: 16c1f818901018e14ad70fed6288e4ac2c415bbd [file] [log] [blame]
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="Documentation for Gobblin, a universal data integration framework">
<meta name="author" content="Apache Software Foundation">
<link rel="shortcut icon" href="img/favicon.ico">
<title>Home - Apache Gobblin</title>
<link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="css/theme.css" type="text/css" />
<link rel="stylesheet" href="css/theme_extra.css" type="text/css" />
<link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
<link href="css/extra.css" rel="stylesheet">
<script>
// Current page data
var mkdocs_page_name = "Home";
var mkdocs_page_input_path = "index.md";
var mkdocs_page_url = null;
</script>
<script src="js/jquery-2.1.1.min.js" defer></script>
<script src="js/modernizr-2.8.3.min.js" defer></script>
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
<script>hljs.initHighlightingOnLoad();</script>
</head>
<body class="wy-body-for-nav" role="document">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
<div class="wy-side-nav-search">
<a href="." class="icon icon-home"> Apache Gobblin</a>
<div role="search">
<form id ="rtd-search-form" class="wy-form" action="./search.html" method="get">
<input type="text" name="q" placeholder="Search docs" title="Type search term here" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul class="current">
<li class="toctree-l1">
<a class="" href="/">Home</a>
</li>
<li class="toctree-l1">
<a class="" href="Powered-By/">Companies Powered By Gobblin</a>
</li>
<li class="toctree-l1">
<a class="" href="Getting-Started/">Getting Started</a>
</li>
<li class="toctree-l1">
<a class="" href="Gobblin-Architecture/">Architecture</a>
</li>
<li class="toctree-l1">
<span class="caption-text">User Guide</span>
<ul class="subnav">
<li class="">
<a class="" href="user-guide/Working-with-Job-Configuration-Files/">Job Configuration Files</a>
</li>
<li class="">
<a class="" href="user-guide/Gobblin-Deployment/">Deployment</a>
</li>
<li class="">
<a class="" href="user-guide/Gobblin-as-a-Library/">Gobblin as a Library</a>
</li>
<li class="">
<a class="" href="user-guide/Gobblin-CLI/">Gobblin CLI</a>
</li>
<li class="">
<a class="" href="user-guide/Gobblin-Compliance/">Gobblin Compliance</a>
</li>
<li class="">
<a class="" href="user-guide/Gobblin-on-Yarn/">Gobblin on Yarn</a>
</li>
<li class="">
<a class="" href="user-guide/Compaction/">Compaction</a>
</li>
<li class="">
<a class="" href="user-guide/State-Management-and-Watermarks/">State Management and Watermarks</a>
</li>
<li class="">
<a class="" href="user-guide/Working-with-the-ForkOperator/">Fork Operator</a>
</li>
<li class="">
<a class="" href="user-guide/Configuration-Properties-Glossary/">Configuration Glossary</a>
</li>
<li class="">
<a class="" href="user-guide/Source-schema-and-Converters/">Source schema and Converters</a>
</li>
<li class="">
<a class="" href="user-guide/Partitioned-Writers/">Partitioned Writers</a>
</li>
<li class="">
<a class="" href="user-guide/Monitoring/">Monitoring</a>
</li>
<li class="">
<a class="" href="user-guide/Gobblin-template/">Template</a>
</li>
<li class="">
<a class="" href="user-guide/Gobblin-Schedulers/">Schedulers</a>
</li>
<li class="">
<a class="" href="user-guide/Job-Execution-History-Store/">Job Execution History Store</a>
</li>
<li class="">
<a class="" href="user-guide/Building-Gobblin/">Building Gobblin</a>
</li>
<li class="">
<a class="" href="user-guide/Gobblin-genericLoad/">Generic Configuration Loading</a>
</li>
<li class="">
<a class="" href="user-guide/Hive-Registration/">Hive Registration</a>
</li>
<li class="">
<a class="" href="user-guide/Config-Management/">Config Management</a>
</li>
<li class="">
<a class="" href="user-guide/Docker-Integration/">Docker Integration</a>
</li>
<li class="">
<a class="" href="user-guide/Troubleshooting/">Troubleshooting</a>
</li>
<li class="">
<a class="" href="user-guide/FAQs/">FAQs</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Sources</span>
<ul class="subnav">
<li class="">
<a class="" href="sources/AvroFileSource/">Avro files</a>
</li>
<li class="">
<a class="" href="sources/CopySource/">File copy</a>
</li>
<li class="">
<a class="" href="sources/QueryBasedSource/">Query based</a>
</li>
<li class="">
<a class="" href="sources/RestApiSource/">Rest Api</a>
</li>
<li class="">
<a class="" href="sources/GoogleAnalyticsSource/">Google Analytics</a>
</li>
<li class="">
<a class="" href="sources/GoogleDriveSource/">Google Drive</a>
</li>
<li class="">
<a class="" href="sources/GoogleWebmaster/">Google Webmaster</a>
</li>
<li class="">
<a class="" href="sources/HadoopTextInputSource/">Hadoop Text Input</a>
</li>
<li class="">
<a class="" href="sources/HelloWorldSource/">Hello World</a>
</li>
<li class="">
<a class="" href="sources/HiveAvroToOrcSource/">Hive Avro-to-ORC</a>
</li>
<li class="">
<a class="" href="sources/HivePurgerSource/">Hive compliance purging</a>
</li>
<li class="">
<a class="" href="sources/SimpleJsonSource/">JSON</a>
</li>
<li class="">
<a class="" href="sources/KafkaSource/">Kafka</a>
</li>
<li class="">
<a class="" href="sources/MySQLSource/">MySQL</a>
</li>
<li class="">
<a class="" href="sources/OracleSource/">Oracle</a>
</li>
<li class="">
<a class="" href="sources/SalesforceSource/">Salesforce</a>
</li>
<li class="">
<a class="" href="sources/SftpSource/">SFTP</a>
</li>
<li class="">
<a class="" href="sources/SqlServerSource/">SQL Server</a>
</li>
<li class="">
<a class="" href="sources/TeradataSource/">Teradata</a>
</li>
<li class="">
<a class="" href="sources/WikipediaSource/">Wikipedia</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Sinks (Writers)</span>
<ul class="subnav">
<li class="">
<a class="" href="sinks/AvroHdfsDataWriter/">Avro HDFS</a>
</li>
<li class="">
<a class="" href="sinks/ParquetHdfsDataWriter/">Parquet HDFS</a>
</li>
<li class="">
<a class="" href="sinks/SimpleBytesWriter/">HDFS Byte array</a>
</li>
<li class="">
<a class="" href="sinks/ConsoleWriter/">Console</a>
</li>
<li class="">
<a class="" href="sinks/CouchbaseWriter/">Couchbase</a>
</li>
<li class="">
<a class="" href="sinks/Http/">HTTP</a>
</li>
<li class="">
<a class="" href="sinks/Gobblin-JDBC-Writer/">JDBC</a>
</li>
<li class="">
<a class="" href="sinks/Kafka/">Kafka</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Gobblin Adaptors</span>
<ul class="subnav">
<li class="">
<a class="" href="adaptors/Gobblin-Distcp/">Gobblin Distcp</a>
</li>
<li class="">
<a class="" href="adaptors/Hive-Avro-To-ORC-Converter/">Hive Avro-To-Orc Converter</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Case Studies</span>
<ul class="subnav">
<li class="">
<a class="" href="case-studies/Kafka-HDFS-Ingestion/">Kafka-HDFS Ingestion</a>
</li>
<li class="">
<a class="" href="case-studies/Publishing-Data-to-S3/">Publishing Data to S3</a>
</li>
<li class="">
<a class="" href="case-studies/Writing-ORC-Data/">Writing ORC Data</a>
</li>
<li class="">
<a class="" href="case-studies/Hive-Distcp/">Hive Distcp</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Gobblin Data Management</span>
<ul class="subnav">
<li class="">
<a class="" href="data-management/Gobblin-Retention/">Retention</a>
</li>
<li class="">
<a class="" href="data-management/DistcpNgEvents/">Distcp-NG events</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Gobblin Metrics</span>
<ul class="subnav">
<li class="">
<a class="" href="metrics/Gobblin-Metrics/">Quick Start</a>
</li>
<li class="">
<a class="" href="metrics/Existing-Reporters/">Existing Reporters</a>
</li>
<li class="">
<a class="" href="metrics/Metrics-for-Gobblin-ETL/">Metrics for Gobblin ETL</a>
</li>
<li class="">
<a class="" href="metrics/Gobblin-Metrics-Architecture/">Gobblin Metrics Architecture</a>
</li>
<li class="">
<a class="" href="metrics/Implementing-New-Reporters/">Implementing New Reporters</a>
</li>
<li class="">
<a class="" href="metrics/Gobblin-Metrics-Performance/">Gobblin Metrics Performance</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Developer Guide</span>
<ul class="subnav">
<li class="">
<a class="" href="developer-guide/Customization-for-New-Source/">Customization for New Source</a>
</li>
<li class="">
<a class="" href="developer-guide/Customization-for-Converter-and-Operator/">Customization for Converter and Operator</a>
</li>
<li class="">
<a class="" href="developer-guide/CodingStyle/">Code Style Guide</a>
</li>
<li class="">
<a class="" href="developer-guide/Gobblin-Compliance-Design/">Gobblin Compliance Design</a>
</li>
<li class="">
<a class="" href="developer-guide/IDE-setup/">IDE setup</a>
</li>
<li class="">
<a class="" href="developer-guide/Monitoring-Design/">Monitoring Design</a>
</li>
<li class="">
<a class="" href="developer-guide/Documentation-Architecture/">Documentation Architecture</a>
</li>
<li class="">
<a class="" href="developer-guide/Contributing/">Contributing</a>
</li>
<li class="">
<a class="" href="developer-guide/GobblinModules/">Gobblin Modules</a>
</li>
<li class="">
<a class="" href="developer-guide/HighLevelConsumer/">High Level Consumer</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Project</span>
<ul class="subnav">
<li class="">
<a class="" href="project/Feature-List/">Feature List</a>
</li>
<li class="">
<a class="" href="/people">Contributors and Team</a>
</li>
<li class="">
<a class="" href="project/Talks-and-Tech-Blogs/">Talks and Tech Blog Posts</a>
</li>
<li class="">
<a class="" href="project/Posts/">Posts</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Miscellaneous</span>
<ul class="subnav">
<li class="">
<a class="" href="miscellaneous/Camus-to-Gobblin-Migration/">Camus to Gobblin Migration</a>
</li>
<li class="">
<a class="" href="miscellaneous/Exactly-Once-Support/">Exactly Once Support</a>
</li>
</ul>
</li>
</ul>
</div>
&nbsp;
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href=".">Apache Gobblin</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href=".">Docs</a> &raquo;</li>
<li>Home</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/apache/incubator-gobblin/edit/master/docs/index.md" rel="nofollow"> Edit on Gobblin</a>
</li>
</ul>
<hr/>
</div>
<div role="main">
<div class="section">
<p align="center">
<img src=img/Gobblin-Logo.png alt="Gobblin Logo" height="200" width="400">
</p>
<p>Over the years, LinkedIn's data infrastructure team built custom solutions for ingesting diverse data entities into our Hadoop eco-system. At one point, we were running 15 types of ingestion pipelines which created significant data quality, metadata management, development, and operation challenges.</p>
<p>Our experiences and challenges motivated us to build <em>Gobblin</em>. Gobblin is a universal data ingestion framework for extracting, transforming, and loading large volume of data from a variety of data sources, e.g., databases, rest APIs, FTP/SFTP servers, filers, etc., onto Hadoop. Gobblin handles the common routine tasks required for all data ingestion ETLs, including job/task scheduling, task partitioning, error handling, state management, data quality checking, data publishing, etc. Gobblin ingests data from different data sources in the same execution framework, and manages metadata of different sources all in one place. This, combined with other features such as auto scalability, fault tolerance, data quality assurance, extensibility, and the ability of handling data model evolution, makes Gobblin an easy-to-use, self-serving, and efficient data ingestion framework.</p>
<p>You can find a lot of useful resources in our wiki pages, including <a href="Getting-Started">how to get started with Gobblin</a>, an <a href="Gobblin-Architecture">architecture overview of Gobblin</a>, and
the <a href="user-guide/Gobblin-Deployment">Gobblin user guide</a>. We also provide a discussion group: <a href="https://groups.google.com/forum/#!forum/gobblin-users" rel="nofollow">Google Gobblin-Users Group</a>. Please feel free to post any questions or comments.</p>
<p>For a detailed overview, please take a look at the <a href="http://www.vldb.org/pvldb/vol8/p1764-qiao.pdf" rel="nofollow">VLDB 2015 paper</a> and the <a href="https://engineering.linkedin.com/data-ingestion/gobblin-big-data-ease" rel="nofollow">LinkedIn's Gobblin blog post</a>.</p>
</div>
</div>
<footer>
<hr/>
<div role="contentinfo">
<!-- Copyright etc -->
</div>
Built with <a href="http://www.mkdocs.org" rel="nofollow">MkDocs</a> using a <a href="https://github.com/snide/sphinx_rtd_theme" rel="nofollow">theme</a> provided by <a href="https://readthedocs.org" rel="nofollow">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<div class="rst-versions" role="note" style="cursor: pointer">
<span class="rst-current-version" data-toggle="rst-current-version">
</span>
</div>
<script>var base_url = '.';</script>
<script src="js/theme.js" defer></script>
<script src="js/extra.js" defer></script>
<script src="search/main.js" defer></script>
</body>
</html>
<!--
MkDocs version : 1.0.4
Build Date UTC : 2020-12-06 23:32:38
-->