| ---+ Sqoop Atlas Bridge |
| |
| ---++ Sqoop Model |
| The default hive model includes the following types: |
| * Entity types: |
| * sqoop_process |
| * super-types: Process |
| * attributes: name, operation, dbStore, hiveTable, commandlineOpts, startTime, endTime, userName |
| * sqoop_dbdatastore |
| * super-types: !DataSet |
| * attributes: name, dbStoreType, storeUse, storeUri, source, description, ownerName |
| |
| * Enum types: |
| * sqoop_operation_type |
| * values: IMPORT, EXPORT, EVAL |
| * sqoop_dbstore_usage |
| * values: TABLE, QUERY, PROCEDURE, OTHER |
| |
| The entities are created and de-duped using unique qualified name. They provide namespace and can be used for querying as well: |
| * sqoop_process.qualifiedName - dbStoreType-storeUri-endTime |
| * sqoop_dbdatastore.qualifiedName - dbStoreType-storeUri-source |
| |
| ---++ Sqoop Hook |
| Sqoop added a !SqoopJobDataPublisher that publishes data to Atlas after completion of import Job. Today, only hiveImport is supported in !SqoopHook. |
| This is used to add entities in Atlas using the model detailed above. |
| |
| Follow the instructions below to setup Atlas hook in Hive: |
| |
| Add the following properties to to enable Atlas hook in Sqoop: |
| * Set-up Atlas hook in <sqoop-conf>/sqoop-site.xml by adding the following: |
| <verbatim> |
| <property> |
| <name>sqoop.job.data.publish.class</name> |
| <value>org.apache.atlas.sqoop.hook.SqoopHook</value> |
| </property></verbatim> |
| * Copy <atlas-conf>/atlas-application.properties to to the sqoop conf directory <sqoop-conf>/ |
| * Link <atlas-home>/hook/sqoop/*.jar in sqoop lib |
| |
| Refer [[Configuration][Configuration]] for notification related configurations |
| |
| ---++ NOTES |
| * Only the following sqoop operations are captured by sqoop hook currently - hiveImport |