[GOBBLIN-1670] Remove rat tasks and unneeded checkstyles blocking build pipeline (#3529)
* Remove rat task on ci-cd
* Disable checkstyle for generated code
* Remove checkstyle on generated rest files as well
* Fix checkstyle for generated java code
* Remove checkstyle for gobblin-metrics-base generated code
* Remove checkstyle for generated code in test utils and http
7 files changed
tree: 4005d775f7329180244fe914395cb7d093c8b692
- .github/
- bin/
- buildSrc/
- conf/
- config/
- dev/
- gobblin-admin/
- gobblin-all/
- gobblin-api/
- gobblin-audit/
- gobblin-aws/
- gobblin-binary-management/
- gobblin-cluster/
- gobblin-compaction/
- gobblin-completeness/
- gobblin-config-management/
- gobblin-core/
- gobblin-core-base/
- gobblin-data-management/
- gobblin-distribution/
- gobblin-docker/
- gobblin-docs/
- gobblin-example/
- gobblin-hive-registration/
- gobblin-iceberg/
- gobblin-kubernetes/
- gobblin-metastore/
- gobblin-metrics-libs/
- gobblin-modules/
- gobblin-oozie/
- gobblin-rest-service/
- gobblin-restli/
- gobblin-runtime/
- gobblin-runtime-hadoop/
- gobblin-salesforce/
- gobblin-service/
- gobblin-test/
- gobblin-test-harness/
- gobblin-test-utils/
- gobblin-tunnel/
- gobblin-utility/
- gobblin-yarn/
- gradle/
- ligradle/
- maven-nexus/
- maven-sonatype/
- .asf.yaml
- .codecov_bash
- .dockerignore
- .gitignore
- build.gradle
- CHANGELOG.md
- defaultEnvironment.gradle
- gobblin-flavored-build.gradle
- gradle.properties
- gradlew
- gradlew.bat
- HEADER
- LICENSE
- mkdocs.yml
- NOTICE
- query_github_issues.py
- README.md
- readthedocs.yml
- settings.gradle
README.md
Apache Gobblin
Apache Gobblin is a highly scalable data management solution for structured and byte-oriented data in heterogeneous data ecosystems.
Capabilities
- Ingestion and export of data from a variety of sources and sinks into and out of the data lake. Gobblin is optimized and designed for ELT patterns with inline transformations on ingest (small t).
- Data Organization within the lake (e.g. compaction, partitioning, deduplication)
- Lifecycle Management of data within the lake (e.g. data retention)
- Compliance Management of data across the ecosystem (e.g. fine-grain data deletions)
Highlights
- Battle tested at scale: Runs in production at petabyte-scale at companies like LinkedIn, PayPal, Verizon etc.
- Feature rich: Supports task partitioning, state management for incremental processing, atomic data publishing, data quality checking, job scheduling, fault tolerance etc.
- Supports stream and batch execution modes
- Control Plane (Gobblin-as-a-service) supports programmatic triggering and orchestration of data plane operations.
Common Patterns used in production
- Stream / Batch ingestion of Kafka to Data Lake (HDFS, S3, ADLS)
- Bulk-loading serving stores from the Data Lake (e.g. HDFS -> Couchbase)
- Support for data sync across Federated Data Lake (HDFS <-> HDFS, HDFS <-> S3, S3 <-> ADLS)
- Integrate external vendor API-s (e.g. Salesforce, Dynamics etc.) with data store (HDFS, Couchbase etc)
- Enforcing Data retention policies and GDPR deletion on HDFS / ADLS
Apache Gobblin is NOT
- A general purpose data transformation engine like Spark or Flink. Gobblin can delegate complex-data processing tasks to Spark, Hive etc.
- A data storage system like Apache Kafka or HDFS. Gobblin integrates with these systems as sources or sinks.
- A general-purpose workflow execution system like Airflow, Azkaban, Dagster, Luigi.
Requirements
If building the distribution with tests turned on:
Instructions to run Apache RAT (Release Audit Tool)
- Extract the archive file to your local directory.
- Run
./gradlew rat
. Report will be generated under build/rat/rat-report.html
Instructions to build the distribution
- Extract the archive file to your local directory.
- Skip tests and build the distribution: Run
./gradlew build -x findbugsMain -x test -x rat -x checkstyleMain
The distribution will be created in build/gobblin-distribution/distributions directory. (or) - Run tests and build the distribution (requires Maven): Run
./gradlew build
The distribution will be created in build/gobblin-distribution/distributions directory.
Quick Links