| ---+ Falcon Atlas Bridge |
| |
| ---++ Falcon Model |
| The default falcon modelling is available in org.apache.atlas.falcon.model.FalconDataModelGenerator. It defines the following types: |
| <verbatim> |
| falcon_cluster(ClassType) - super types [Infrastructure] - attributes [timestamp, colo, owner, tags] |
| falcon_feed(ClassType) - super types [DataSet] - attributes [timestamp, stored-in, owner, groups, tags] |
| falcon_feed_creation(ClassType) - super types [Process] - attributes [timestamp, stored-in, owner] |
| falcon_feed_replication(ClassType) - super types [Process] - attributes [timestamp, owner] |
| falcon_process(ClassType) - super types [Process] - attributes [timestamp, runs-on, owner, tags, pipelines, workflow-properties] |
| </verbatim> |
| |
| One falcon_process entity is created for every cluster that the falcon process is defined for. |
| |
| The entities are created and de-duped using unique qualifiedName attribute. They provide namespace and can be used for querying/lineage as well. The unique attributes are: |
| * falcon_process - <process name>@<cluster name> |
| * falcon_cluster - <cluster name> |
| * falcon_feed - <feed name>@<cluster name> |
| * falcon_feed_creation - <feed name> |
| * falcon_feed_replication - <feed name> |
| |
| ---++ Falcon Hook |
| Falcon supports listeners on falcon entity submission. This is used to add entities in Atlas using the model defined in org.apache.atlas.falcon.model.FalconDataModelGenerator. |
| The hook submits the request to a thread pool executor to avoid blocking the command execution. The thread submits the entities as message to the notification server and atlas server reads these messages and registers the entities. |
| * Add 'org.apache.atlas.falcon.service.AtlasService' to application.services in <falcon-conf>/startup.properties |
| * Link falcon hook jars in falcon classpath - 'ln -s <atlas-home>/hook/falcon/* <falcon-home>/server/webapp/falcon/WEB-INF/lib/' |
| * In <falcon_conf>/falcon-env.sh, set an environment variable as follows: |
| <verbatim> |
| export FALCON_SERVER_OPTS="$FALCON_SERVER_OPTS -Datlas.conf=<atlas-conf>" |
| </verbatim> |
| |
| The following properties in <atlas-conf>/atlas-application.properties control the thread pool and notification details: |
| * atlas.hook.falcon.synchronous - boolean, true to run the hook synchronously. default false |
| * atlas.hook.falcon.numRetries - number of retries for notification failure. default 3 |
| * atlas.hook.falcon.minThreads - core number of threads. default 5 |
| * atlas.hook.falcon.maxThreads - maximum number of threads. default 5 |
| * atlas.hook.falcon.keepAliveTime - keep alive time in msecs. default 10 |
| * atlas.hook.falcon.queueSize - queue size for the threadpool. default 10000 |
| |
| Refer [[Configuration][Configuration]] for notification related configurations |
| |
| |
| ---++ Limitations |
| * In falcon cluster entity, cluster name used should be uniform across components like hive, falcon, sqoop etc. If used with ambari, ambari cluster name should be used for cluster entity |