commit | 48f304dc316ed380fef2c5c7891f7ec41e0ddd30 | [log] [tgz] |
---|---|---|
author | chitralverma <chitralverma@gmail.com> | Thu Nov 21 09:26:44 2019 +0800 |
committer | William Guo <guoyp@apache.org> | Thu Nov 21 09:26:44 2019 +0800 |
tree | 0c528cd4c8a6a4cedf48c8485de6d110a9d50479 | |
parent | 350663f61842ede34f06224aec92e38d1bff92ee [diff] |
[GRIFFIN-297] Allow support for additional file based data sources **What changes were proposed in this pull request?** The PR extends the current support beyond just Avro and Text for various file based data sources (Parquet, ORC, etc). - Allows users to specify additional file based data sources like Parquet, CSV, TSV, ORC etc. - Allows data to be read directly from stand-alone files as well as directories present in both local/ distributed file systems. - Allows users to specify schema directly through options (useful for CSV/ TSV types). A sample config looks like, ``` { "name": "source", "baseline": true, "connectors": [ { "type": "file", "version": "1.7", "config": { "format": "parquet", "options": { "k1": "v1", "k2": "v2" }, "paths": [ "/home/chitral/path/to/source/", "/home/chitral/path/to/test.parquet" ] } } ] } ``` **Does this PR introduce any user-facing change?** No **How was this patch tested?** Griffin test suite. Some additional unit test has also been added. Author: chitralverma <chitralverma@gmail.com> Closes #555 from chitralverma/allow_file_based_batch_connectors.
The data quality (DQ) is a key criteria for many data consumers like IoT, machine learning etc., however, there is no standard agreement on how to determine “good” data. Apache Griffin is a model-driven data quality service platform where you can examine your data on-demand. It provides a standard process to define data quality measures, executions and reports, allowing those examinations across multiple data systems. When you don't trust your data, or concern that poorly controlled data can negatively impact critical decision, you can utilize Apache Griffin to ensure data quality.
You can try running Griffin in docker following the docker guide.
Follow Apache Griffin Development Environment Build Guide to set up development environment.
If you want to contribute codes to Griffin, please follow Apache Griffin Development Code Style Config Guide to keep consistent code style.
If you want to deploy Griffin in your local environment, please follow Apache Griffin Deployment Guide.
For more information about Griffin, please visit our website at: griffin home page.
You can contact us via email:
You can also subscribe the latest information by sending a email to subscribe dev-list and subscribe user-list. You can also subscribe the latest information by sending a email to subscribe dev-list and user-list:
dev-subscribe@griffin.apache.org users-subscribe@griffin.apache.org
You can access our issues on JIRA page
See How to Contribute for details on how to contribute code, documentation, etc.
Here's the most direct way to contribute your work merged into Apache Griffin.