There are 3 main parts of the system: Web UI, Backend Services, and Measure engine on Hadoop. Please refer to the below high level architecture diagram
There are several key componets hosted on the web server, such as RESTful web service, Job scheduler, Core Service. We provides RESTful APIs to support all kinds of clients(web browser, mobile app, applications on PC, etc.). The Job scheduler is responsible for triggering the execution of the measures and persist the metrics values. The Metadata engine is to provide most of the management features such as measure management, data asset management, notification management, user settings, etc.
We follow the principles in https://codeplanet.io/principles-good-restful-api-design/ to build up our RESTful web services.
Root URL: https://example.org/api/v1/*
And the resources should be authorized. Refer to https://jersey.java.net/documentation/latest/security.html
May leverage QUARTZ: http://www.quartz-scheduler.org/
Here are the key components in Core Service:
For each component, we follow the MVC-based architecture.
Here're the typical classes/interfaces for each component
For data ingestion, griffin can consume real time data from streaming and also support batch data like sql or structured files.
For data processing, griffin provides several measure libraries for various data quality dimensions like accuracy, validity, etc.
After appending business rules into those measures, griffin will deploy related jobs in spark cluster.
For more details of the design, please refer to https://github.com/eBay/DQSolution/blob/master/griffin-doc/measures.md