blob: bf942f25b4902f96677736020054f44be6f2a96b [file] [log] [blame]
---+ Sqoop Atlas Bridge
---++ Sqoop Model
The default Sqoop modelling is available in org.apache.atlas.sqoop.model.SqoopDataModelGenerator. It defines the following types:
<verbatim>
sqoop_operation_type(EnumType) - values [IMPORT, EXPORT, EVAL]
sqoop_dbstore_usage(EnumType) - values [TABLE, QUERY, PROCEDURE, OTHER]
sqoop_process(ClassType) - super types [Process] - attributes [name, operation, dbStore, hiveTable, commandlineOpts, startTime, endTime, userName]
sqoop_dbdatastore(ClassType) - super types [DataSet] - attributes [name, dbStoreType, storeUse, storeUri, source, description, ownerName]
</verbatim>
The entities are created and de-duped using unique qualified name. They provide namespace and can be used for querying as well:
sqoop_process - attribute name - sqoop-dbStoreType-storeUri-endTime
sqoop_dbdatastore - attribute name - dbStoreType-connectorUrl-source
---++ Sqoop Hook
Sqoop added a !SqoopJobDataPublisher that publishes data to Atlas after completion of import Job. Today, only hiveImport is supported in sqoopHook.
This is used to add entities in Atlas using the model defined in org.apache.atlas.sqoop.model.SqoopDataModelGenerator.
Follow these instructions in your sqoop set-up to add sqoop hook for Atlas in <sqoop-conf>/sqoop-site.xml:
* Sqoop Job publisher class. Currently only one publishing class is supported
<property>
<name>sqoop.job.data.publish.class</name>
<value>org.apache.atlas.sqoop.hook.SqoopHook</value>
</property>
* Atlas cluster name
<property>
<name>atlas.cluster.name</name>
<value><clustername></value>
</property>
* Copy <atlas-conf>/atlas-application.properties to to the sqoop conf directory <sqoop-conf>/
* Link <atlas-home>/hook/sqoop/*.jar in sqoop lib
Refer [[Configuration][Configuration]] for notification related configurations
---++ Limitations
* Only the following sqoop operations are captured by sqoop hook currently - hiveImport