| --- |
| name: HBase |
| route: /HookHBase |
| menu: Documentation |
| submenu: Hooks |
| --- |
| |
| import themen from 'theme/styles/styled-colors'; |
| import * as theme from 'react-syntax-highlighter/dist/esm/styles/hljs'; |
| import SyntaxHighlighter from 'react-syntax-highlighter'; |
| |
| # Apache Atlas Hook & Bridge for Apache HBase |
| |
| ## HBase Model |
| HBase model includes the following types: |
| * Entity types: |
| * hbase_namespace |
| * super-types: Asset |
| * attributes: qualifiedName, name, description, owner, clusterName, parameters, createTime, modifiedTime |
| * hbase_table |
| * super-types: DataSet |
| * attributes: qualifiedName, name, description, owner, namespace, column_families, uri, parameters, createtime, modifiedtime, maxfilesize, isReadOnly, isCompactionEnabled, isNormalizationEnabled, ReplicaPerRegion, Durability |
| * hbase_column_family |
| * super-types: DataSet |
| * attributes: qualifiedName, name, description, owner, columns, createTime, bloomFilterType, compressionType, compactionCompressionType, encryptionType, inMemoryCompactionPolicy, keepDeletedCells, maxversions, minVersions, datablockEncoding, storagePolicy, ttl, blockCachedEnabled, cacheBloomsOnWrite, cacheDataOnWrite, evictBlocksOnClose, prefetchBlocksOnOpen, newVersionsBehavior, isMobEnabled, mobCompactPartitionPolicy |
| |
| HBase entities are created and de-duped in Atlas using unique attribute qualifiedName, whose value should be formatted as detailed below. Note that namespaceName, tableName and columnFamilyName should be in lower case. |
| |
| <SyntaxHighlighter wrapLines={true} language="java" style={theme.dark}> |
| {`hbase_namespace.qualifiedName: <namespaceName>@<clusterName> |
| hbase_table.qualifiedName: <namespaceName>:<tableName>@<clusterName> |
| hbase_column_family.qualifiedName: <namespaceName>:<tableName>.<columnFamilyName>@<clusterName>`} |
| </SyntaxHighlighter> |
| |
| |
| ## HBase Hook |
| Atlas HBase hook registers with HBase master as a co-processor. On detecting changes to HBase namespaces/tables/column-families, Atlas hook updates the metadata in Atlas via Kafka notifications. |
| Follow the instructions below to setup Atlas hook in HBase: |
| * Register Atlas hook in hbase-site.xml by adding the following: |
| |
| <SyntaxHighlighter wrapLines={true} language="xml" style={theme.dark}> |
| {`<property> |
| <name>hbase.coprocessor.master.classes</name> |
| <value>org.apache.atlas.hbase.hook.HBaseAtlasCoprocessor</value> |
| </property>`} |
| </SyntaxHighlighter> |
| |
| * untar apache-atlas-${project.version}-hbase-hook.tar.gz |
| * cd apache-atlas-hbase-hook-${project.version} |
| * Copy entire contents of folder apache-atlas-hbase-hook-${project.version}/hook/hbase to `<atlas package>`/hook/hbase |
| * Link Atlas hook jars in HBase classpath - 'ln -s `<atlas package>`/hook/hbase/* `<hbase-home>`/lib/' |
| * Copy `<atlas-conf>`/atlas-application.properties to the HBase conf directory. |
| |
| The following properties in atlas-application.properties control the thread pool and notification details: |
| |
| <SyntaxHighlighter wrapLines={true} language="java" style={theme.dark}> |
| {`atlas.hook.hbase.synchronous=false # whether to run the hook synchronously. false is recommended to avoid delays in HBase operations. Default: false |
| atlas.hook.hbase.numRetries=3 # number of retries for notification failure. Default: 3 |
| atlas.hook.hbase.queueSize=10000 # queue size for the threadpool. Default: 10000 |
| atlas.cluster.name=primary # clusterName to use in qualifiedName of entities. Default: primary |
| atlas.kafka.zookeeper.connect= # Zookeeper connect URL for Kafka. Example: localhost:2181 |
| atlas.kafka.zookeeper.connection.timeout.ms=30000 # Zookeeper connection timeout. Default: 30000 |
| atlas.kafka.zookeeper.session.timeout.ms=60000 # Zookeeper session timeout. Default: 60000 |
| atlas.kafka.zookeeper.sync.time.ms=20 # Zookeeper sync time. Default: 20`} |
| </SyntaxHighlighter> |
| |
| Other configurations for Kafka notification producer can be specified by prefixing the configuration name with "atlas.kafka.". |
| For list of configuration supported by Kafka producer, please refer to [Kafka Producer Configs](http://kafka.apache.org/documentation/#producerconfigs) |
| |
| ## NOTES |
| * Only the namespace, table and column-family create/update/delete operations are captured by Atlas HBase hook. Changes to columns are be captured. |
| |
| |
| ## Importing HBase Metadata |
| Apache Atlas provides a command-line utility, import-hbase.sh, to import metadata of Apache HBase namespaces and tables into Apache Atlas. |
| This utility can be used to initialize Apache Atlas with namespaces/tables present in an Apache HBase cluster. |
| This utility supports importing metadata of a specific table, tables in a specific namespace or all tables. |
| |
| <SyntaxHighlighter wrapLines={true} language="java" style={theme.dark}> |
| {`Usage 1: <atlas package>/hook-bin/import-hbase.sh |
| Usage 2: <atlas package>/hook-bin/import-hbase.sh [-n <namespace regex> OR --namespace <namespace regex>] [-t <table regex> OR --table <table regex>] |
| Usage 3: <atlas package>/hook-bin/import-hbase.sh [-f <filename>] |
| File Format: |
| namespace1:tbl1 |
| namespace1:tbl2 |
| namespace2:tbl1`} |
| </SyntaxHighlighter> |