import themen from ‘theme/styles/styled-colors’; import * as theme from ‘react-syntax-highlighter/dist/esm/styles/hljs’; import SyntaxHighlighter from ‘react-syntax-highlighter’;
From the directory you would like Apache Atlas to be installed, run the following commands:
To run Apache Atlas with local Apache HBase & Apache Solr instances that are started/stopped along with Atlas start/stop, run following commands:
To stop Apache Atlas, run following command:
By default config directory used by Apache Atlas is {package dir}/conf. To override this set environment variable ATLAS_CONF to the path of the conf dir.
Environment variables needed to run Apache Atlas can be set in atlas-env.sh file in the conf directory. This file will be sourced by Apache Atlas scripts before any commands are executed. The following environment variables are available to set.
#export ATLAS_OPTS=
#export ATLAS_CLIENT_OPTS=
#export ATLAS_CLIENT_HEAP=
#export ATLAS_SERVER_OPTS=
#export ATLAS_SERVER_HEAP=
#export ATLAS_HOME_DIR=
#export ATLAS_LOG_DIR=
#export ATLAS_PID_DIR=
#export ATLAS_EXPANDED_WEBAPP_DIR=`}
Settings to support large number of metadata objects
If you plan to store large number of metadata objects, it is recommended that you use values tuned for better GC performance of the JVM.
The following values are common server side options:
The -XX:SoftRefLRUPolicyMSPerMB
option was found to be particularly helpful to regulate GC performance for query heavy workloads with many concurrent users.
The following values are recommended for JDK 8:
NOTE for Mac OS users If you are using a Mac OS, you will need to configure the ATLAS_SERVER_OPTS (explained above).
In {package dir}/conf/atlas-env.sh uncomment the following line {export ATLAS_SERVER_OPTS=
}
and change it to look as below
By default, Apache Atlas uses JanusGraph as the graph repository and is the only graph repository implementation available currently. Apache HBase versions currently supported are 1.1.x. For configuring Apache Atlas graph persistence on Apache HBase, please see “Graph persistence engine - HBase” in the Configuration section for more details.
Apache HBase tables used by Apache Atlas can be set using the following configurations:
By default, Apache Atlas uses JanusGraph as the graph repository and is the only graph repository implementation available currently. For configuring JanusGraph to work with Apache Solr, please follow the instructions below
Install Apache Solr if not already running. The version of Apache Solr supported is 5.5.1. Could be installed from http://archive.apache.org/dist/lucene/solr/5.5.1/solr-5.5.1.tgz
Start Apache Solr in cloud mode.
SolrCloud mode uses a ZooKeeper Service as a highly available, central location for cluster management. For a small cluster, running with an existing ZooKeeper quorum should be fine. For larger clusters, you would want to run separate multiple ZooKeeper quorum with at least 3 servers. Note: Apache Atlas currently supports Apache Solr in “cloud” mode only. “http” mode is not supported. For more information, refer Apache Solr documentation - https://cwiki.apache.org/confluence/display/solr/SolrCloud
Note: If numShards and replicationFactor are not specified, they default to 1 which suffices if you are trying out solr with ATLAS on a single node instance. Otherwise specify numShards according to the number of hosts that are in the Solr cluster and the maxShardsPerNode configuration. The number of shards cannot exceed the total number of Solr nodes in your SolrCloud cluster.
The number of replicas (replicationFactor) can be set according to the redundancy required.
Also note that Apache Solr will automatically be called to create the indexes when Apache Atlas server is started if the SOLR_BIN and SOLR_CONF environment variables are set and the search indexing backend is set to ‘solr5’.
For more information on JanusGraph solr configuration , please refer http://docs.janusgraph.org/0.2.0/solr.html
Pre-requisites for running Apache Solr in cloud mode
Configuring Elasticsearch as the indexing backend for the Graph Repository (Tech Preview)
By default, Apache Atlas uses JanusGraph as the graph repository and is the only graph repository implementation available currently. For configuring JanusGraph to work with Elasticsearch, please follow the instructions below
Install an Elasticsearch cluster. The version currently supported is 5.6.4, and can be acquired from: https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.4.tar.gz
For simple testing a single Elasticsearch node can be started by using the ‘elasticsearch’ command in the bin directory of the Elasticsearch distribution.
Change Apache Atlas configuration to point to the Elasticsearch instance setup. Please make sure the following configurations are set to the below values in ATLAS_HOME/conf/atlas-application.properties
For more information on JanusGraph configuration for elasticsearch, please refer http://docs.janusgraph.org/0.2.0/elasticsearch.html
Apache Atlas uses Apache Kafka to ingest metadata from other components at runtime. This is described in the Architecture in more detail. Depending on the configuration of Apache Kafka, sometimes you might need to setup the topics explicitly before using Apache Atlas. To do so, Apache Atlas provides a script =bin/atlas_kafka_setup.py= which can be run from Apache Atlas server. In some environments, the hooks might start getting used first before Apache Atlas server itself is setup. In such cases, the topics can be run on the hosts where hooks are installed using a similar script hook-bin/atlas_kafka_setup_hook.py
. Both these use configuration in atlas-application.properties
for setting up the topics. Please refer to the Configuration for these details.
There are a few steps that setup dependencies of Apache Atlas. One such example is setting up the JanusGraph schema in the storage backend of choice. In a simple single server setup, these are automatically setup with default configuration when the server first accesses these dependencies.
However, there are scenarios when we may want to run setup steps explicitly as one time operations. For example, in a multiple server scenario using High Availability, it is preferable to run setup steps from one of the server instances the first time, and then start the services.
To run these steps one time, execute the command =bin/atlas_start.py -setup= from a single Apache Atlas server instance.
However, Apache Atlas server does take care of parallel executions of the setup steps. Also, running the setup steps multiple times is idempotent. Therefore, if one chooses to run the setup steps as part of server startup, for convenience, then they should enable the configuration option atlas.server.run.setup.on.start
by defining it with the value true
in the atlas-application.properties
file.
Here are few examples of calling Apache Atlas REST APIs via curl command.
If the setup of Apache Atlas service fails due to any reason, the next run of setup (either by an explicit invocation of atlas_start.py -setup
or by enabling the configuration option atlas.server.run.setup.on.start
) will fail with a message such as A previous setup run may not have completed cleanly.
. In such cases, you would need to manually ensure the setup can run and delete the Zookeeper node at /apache_atlas/setup_in_progress
before attempting to run setup again.
If the setup failed due to Apache HBase schema setup errors, it may be necessary to repair Apache HBase schema. If no data has been stored, one can also disable and drop the Apache HBase tables used by Apache Atlas and run setup again.