commit	1aa8995af89b723b6fab6f0922afa1916a2f8084	[log] [tgz]
author	chitralverma <chitralverma@gmail.com>	Mon Aug 10 10:49:42 2020 +0800
committer	William Guo <guoyp@apache.org>	Mon Aug 10 10:49:42 2020 +0800
tree	9e012dd92494749eb2efee9613006459d1f4514b
parent	7123269b5cc69806499bffc0757da63eaa1b9306 [diff]

commit

1aa8995af89b723b6fab6f0922afa1916a2f8084

[log] [tgz]

author

chitralverma <chitralverma@gmail.com>

Mon Aug 10 10:49:42 2020 +0800

committer

William Guo <guoyp@apache.org>

Mon Aug 10 10:49:42 2020 +0800

tree

9e012dd92494749eb2efee9613006459d1f4514b

parent

7123269b5cc69806499bffc0757da63eaa1b9306 [diff]

[GRIFFIN-305] Standardize sink hierarchy **What changes were proposed in this pull request?** Currently, the implementation of `Sinks` in Griffin poses the below issues. This PR aims at fixing these issues. - `Sinks` are based on the recursive MultiSink class which is a sink itself but the underlying implementation is that of a `Seq` which causes ambiguity and isn't much useful. This has been removed. - Some unused code like `SinkContext` has been removed. - Data is converted from the performant DataFrame to RDD while persisting in both streaming and batch pipelines. A new method `sinkBatchRecords` has been added to allow operations directly on DataFrame for batch pipelines. Streaming will still use the old implementation which will be replaced with structured streaming. - Refactored the methods of `Sink` like changed `start`/ `finish` to `open`/ `close` and `jobName` was incorrectly passed as `metricName`. - Presently, only one instance of a sink with a given type can be defined in the env config. This will not allow the cases where you want to configure multiple sinks of same type like HDFS or JDBC. Added sink `name` to env config which is used to define the sink that should be used in the job config also. - Updated all sinks as per the changes above. With some additional changes to ConsoleSink **Does this PR introduce any user-facing change?** Yes. As mentioned above, the sink config has changed in env and job configs. **How was this patch tested?** Griffin test suite and additional unit test cases Author: chitralverma <chitralverma@gmail.com> Closes #575 from chitralverma/standardize-sink-hierarchy.

tree: 9e012dd92494749eb2efee9613006459d1f4514b

README.md

Apache Griffin

The data quality (DQ) is a key criteria for many data consumers like IoT, machine learning etc., however, there is no standard agreement on how to determine “good” data. Apache Griffin is a model-driven data quality service platform where you can examine your data on-demand. It provides a standard process to define data quality measures, executions and reports, allowing those examinations across multiple data systems. When you don't trust your data, or concern that poorly controlled data can negatively impact critical decision, you can utilize Apache Griffin to ensure data quality.

Getting Started

Quick Start

You can try running Griffin in docker following the docker guide.

Environment for Dev

Follow Apache Griffin Development Environment Build Guide to set up development environment.
If you want to contribute codes to Griffin, please follow Apache Griffin Development Code Style Config Guide to keep consistent code style.

Deployment at Local

If you want to deploy Griffin in your local environment, please follow Apache Griffin Deployment Guide.

Community

For more information about Griffin, please visit our website at: griffin home page.

You can contact us via email:

dev-list: dev@griffin.apache.org
user-list: users@griffin.apache.org

You can also subscribe the latest information by sending a email to subscribe dev-list and subscribe user-list. You can also subscribe the latest information by sending a email to subscribe dev-list and user-list:

dev-subscribe@griffin.apache.org
users-subscribe@griffin.apache.org

You can access our issues on JIRA page

Contributing

See How to Contribute for details on how to contribute code, documentation, etc.

Here's the most direct way to contribute your work merged into Apache Griffin.

Fork the project from github
Clone down your fork
Implement your feature or bug fix and commit changes
Push the branch up to your fork
Send a pull request to Apache Griffin master branch