FALCON-2059 BacklogMetricEmitter Service for Falcon Processes

Author: pavan.kolamuri <pavan.kolamuri@gmail.com>
Author: Pavan Kolamuri <pavan.kolamuri@appdynamics.com>

Reviewers: @pallavi-rao

Closes #212 from pavankumar526/master and squashes the following commits:

ad84f0f [Pavan Kolamuri] Fixed checkstyle issues
dea8f93 [pavan.kolamuri] Added doc in startup.properties
dbe3a7f [pavan.kolamuri] Added more log statements
d72d228 [pavan.kolamuri] Exception changed to throwable
46dcef8 [pavan.kolamuri] Fixed bug in oozieworkflowengine FALCON-2059
e92d3bc [pavan.kolamuri] Add isMissing method FALCON-2059
6d6cf81 [pavan.kolamuri] Handled when entity was deleted FALCON-2059
6c03701 [pavan.kolamuri] Fixed User authentication issue in oozie
81f0b03 [pavan.kolamuri] Rebased the patch
e3dbe88 [pavan.kolamuri] Handled multiple pipelines processes FALCON-2059
b5d9e70 [pavan.kolamuri] Addressed based on comments FALCON-2059
fb78fba [pavan.kolamuri] Refactored changes based on EntitySLAAlert service
80c015a [pavan.kolamuri] FALCON-2059 BacklogMetricEmitter Service for Falcon Processes
20 files changed
tree: 184a7119e9c5588227e40f1ad8a6791585a80352
  1. acquisition/
  2. addons/
  3. archival/
  4. build-tools/
  5. cli/
  6. client/
  7. common/
  8. distro/
  9. docs/
  10. examples/
  11. extensions/
  12. falcon-regression/
  13. falcon-ui/
  14. hadoop-dependencies/
  15. html5-ui/
  16. lifecycle/
  17. messaging/
  18. metrics/
  19. monitoring/
  20. oozie/
  21. oozie-el-extensions/
  22. prism/
  23. release-docs/
  24. replication/
  25. rerun/
  26. retention/
  27. scheduler/
  28. src/
  29. test-tools/
  30. test-util/
  31. titan/
  32. unit/
  33. webapp/
  34. .gitignore
  35. .reviewboardrc
  36. CHANGES.txt
  37. falcon_merge_pr.py
  38. Installation-steps.txt
  39. LICENSE.txt
  40. NOTICE.txt
  41. pom.xml
  42. README.md
README.md

Apache Falcon

Falcon is a feed processing and feed management system aimed at making it easier for end consumers to onboard their feed processing and feed management on hadoop clusters.

Why Apache Falcon?

  • Dependencies across various data processing pipelines are not easy to establish. Gaps here typically leads to either incorrect/partial processing or expensive reprocessing. Repeated duplicate definition of a single feed multiple times can lead to inconsistencies / issues.

  • Input data may not arrive always on time and it is required to kick off the processing without waiting for all data to arrive and accommodate late data separately

  • Feed management services such as feed retention, replications across clusters, archival etc are tasks that are burdensome on individual pipeline owners and better offered as a service for all customers.

  • It should be easy to onboard new workflows/pipelines

  • Smoother integration with metastore/catalog

  • Provide notification to end customer based on availability of feed groups (logical group of related feeds, which are likely to be used together)

Online Documentation

You can find the documentation on Apache Falcon website.

How to Contribute

Before opening a pull request, please go through the Contributing to Apache Falcon wiki. It lists steps that are required before creating a PR and the conventions that we follow. If you are looking for issues to pick up then you can look at starter tasks or open tasks

Release Notes

You can download release notes of previous releases from the following links.

0.8

0.7