docs/src/site/twiki/InstallationSteps.twiki - incubator-atlas - Git at Google

 ---++ Building & Installing Apache Atlas

 ---+++ Building Atlas

 <verbatim>
 git clone https://git-wip-us.apache.org/repos/asf/incubator-atlas.git atlas

 cd atlas

 export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=256m" && mvn clean install
 </verbatim>

 Once the build successfully completes, artifacts can be packaged for deployment.

 <verbatim>

 mvn clean package -Pdist

 </verbatim>

 Tar can be found in atlas/distro/target/apache-atlas-${project.version}-bin.tar.gz

 Tar is structured as follows

 <verbatim>

 |- bin
    |- atlas_start.py
    |- atlas_stop.py
    |- atlas_config.py
    |- quick_start.py
    |- cputil.py
 |- conf
    |- application.properties
    |- client.properties
    |- atlas-env.sh
    |- log4j.xml
    |- solr
       |- currency.xml
       |- lang
          |- stopwords_en.txt
       |- protowords.txt
       |- schema.xml
       |- solrconfig.xml
       |- stopwords.txt
       |- synonyms.txt
 |- docs
 |- server
    |- webapp
       |- atlas.war
 |- README
 |- NOTICE.txt
 |- LICENSE.txt
 |- DISCLAIMER.txt
 |- CHANGES.txt

 </verbatim>

 ---+++ Installing & Running Atlas

 *Installing Atlas*
 <verbatim>
 tar -xzvf apache-atlas-${project.version}-bin.tar.gz
 * cd atlas-${project.version}
 </verbatim>

 *Configuring Atlas*

 By default config directory used by Atlas is {package dir}/conf. To override this set environment
 variable METADATA_CONF to the path of the conf dir.

 atlas-env.sh has been added to the Atlas conf. This file can be used to set various environment
 variables that you need for you services. In addition you can set any other environment
 variables you might need. This file will be sourced by atlas scripts before any commands are
 executed. The following environment variables are available to set.

 <verbatim>
 # The java implementation to use. If JAVA_HOME is not found we expect java and jar to be in path
 #export JAVA_HOME=

 # any additional java opts you want to set. This will apply to both client and server operations
 #export METADATA_OPTS=

 # any additional java opts that you want to set for client only
 #export METADATA_CLIENT_OPTS=

 # java heap size we want to set for the client. Default is 1024MB
 #export METADATA_CLIENT_HEAP=

 # any additional opts you want to set for atlas service.
 #export METADATA_SERVER_OPTS=

 # java heap size we want to set for the atlas server. Default is 1024MB
 #export METADATA_SERVER_HEAP=

 # What is is considered as atlas home dir. Default is the base locaion of the installed software
 #export METADATA_HOME_DIR=

 # Where log files are stored. Defatult is logs directory under the base install location
 #export METADATA_LOG_DIR=

 # Where pid files are stored. Defatult is logs directory under the base install location
 #export METADATA_PID_DIR=

 # where the atlas titan db data is stored. Defatult is logs/data directory under the base install location
 #export METADATA_DATA_DIR=

 # Where do you want to expand the war file. By Default it is in /server/webapp dir under the base install dir.
 #export METADATA_EXPANDED_WEBAPP_DIR=
 </verbatim>


 *NOTE for Mac OS users*
 <verbatim>
 If you are using a Mac OS, you will need to configure the METADATA_SERVER_OPTS (explained above).

 In  {package dir}/conf/atlas-env.sh uncomment the following line
 #export METADATA_SERVER_OPTS=

 and change it to look as below
 export METADATA_SERVER_OPTS="-Djava.awt.headless=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
 </verbatim>

 * Hbase as the Storage Backend for the Graph Repository

 By default, Atlas uses Titan as the graph repository and is the only graph repository implementation available currently.
 The HBase versions currently supported are 1.1.x. For configuring ATLAS graph persistence on HBase, please go through the "Configuration - Graph persistence engine - HBase" section
 for more details.

 Pre-requisites for running HBase as a distributed cluster
  * 3 or 5 ZooKeeper nodes
  * Atleast 3 RegionServer nodes. It would be ideal to run the DataNodes on the same hosts as the Region servers for data locality.

 * Configuring SOLR as the Indexing Backend for the Graph Repository

 By default, Atlas uses Titan as the graph repository and is the only graph repository implementation available currently.
 For configuring Titan to work with Solr, please follow the instructions below
 <verbatim>
 * Install solr if not already running. The version of SOLR supported is 5.2.1. Could be installed from http://archive.apache.org/dist/lucene/solr/5.2.1/solr-5.2.1.tgz

 * Start solr in cloud mode.
   SolrCloud mode uses a ZooKeeper Service as a highly available, central location for cluster management.
   For a small cluster, running with an existing ZooKeeper quorum should be fine. For larger clusters, you would want to run separate multiple ZooKeeper quorum with atleast 3 servers.
   Note: Atlas currently supports solr in "cloud" mode only. "http" mode is not supported. For more information, refer solr documentation - https://cwiki.apache.org/confluence/display/solr/SolrCloud

 * For e.g., to bring up a Solr node listening on port 8983 on a machine, you can use the command:
       <verbatim>
       $SOLR_HOME/bin/solr start -c -z <zookeeper_host:port> -p 8983
       </verbatim>

 * Run the following commands from SOLR_HOME directory to create collections in Solr corresponding to the indexes that Atlas uses. In the case that the ATLAS and SOLR instance are on 2 different hosts,
   first copy the required configuration files from ATLAS_HOME/conf/solr on the ATLAS instance host to the Solr instance host. SOLR_CONF in the below mentioned commands refer to the directory where the solr configuration files
   have been copied to on Solr host:

   bin/solr create -c vertex_index -d SOLR_CONF -shards #numShards -replicationFactor #replicationFactor
   bin/solr create -c edge_index -d SOLR_CONF -shards #numShards -replicationFactor #replicationFactor
   bin/solr create -c fulltext_index -d SOLR_CONF -shards #numShards -replicationFactor #replicationFactor

   Note: If numShards and replicationFactor are not specified, they default to 1 which suffices if you are trying out solr with ATLAS on a single node instance.
   Otherwise specify numShards according to the number of hosts that are in the Solr cluster and the maxShardsPerNode configuration.
   The number of shards cannot exceed the total number of Solr nodes in your SolrCloud cluster.

   The number of replicas (replicationFactor) can be set according to the redundancy required.

 * Change ATLAS configuration to point to the Solr instance setup. Please make sure the following configurations are set to the below values in ATLAS_HOME//conf/application.properties
  atlas.graph.index.search.backend=solr5
  atlas.graph.index.search.solr.mode=cloud
  atlas.graph.index.search.solr.zookeeper-url=<the ZK quorum setup for solr as comma separated value> eg: 10.1.6.4:2181,10.1.6.5:2181

 * Restart Atlas
 </verbatim>

 For more information on Titan solr configuration , please refer http://s3.thinkaurelius.com/docs/titan/0.5.4/solr.htm

 Pre-requisites for running Solr in cloud mode
   * Memory - Solr is both memory and CPU intensive. Make sure the server running Solr has adequate memory, CPU and disk.
     Solr works well with 32GB RAM. Plan to provide as much memory as possible to Solr process
   * Disk - If the number of entities that need to be stored are large, plan to have at least 500 GB free space in the volume where Solr is going to store the index data
   * SolrCloud has support for replication and sharding. It is highly recommended to use SolrCloud with at least two Solr nodes running on different servers with replication enabled.
     If using SolrCloud, then you also need ZooKeeper installed and configured with 3 or 5 ZooKeeper nodes

 *Starting Atlas Server*
 <verbatim>
 bin/atlas_start.py [-port <port>]
 </verbatim>

 By default,
 * To change the port, use -port option.
 * atlas server starts with conf from {package dir}/conf. To override this (to use the same conf
 with multiple atlas upgrades), set environment variable METADATA_CONF to the path of conf dir

 *Using Atlas*
 <verbatim>
 * Quick start model - sample model and data
   bin/quick_start.py [<atlas endpoint>]

 * Verify if the server is up and running
   curl -v http://localhost:21000/api/atlas/admin/version
   {"Version":"v0.1"}

 * List the types in the repository
   curl -v http://localhost:21000/api/atlas/types
   {"results":["Process","Infrastructure","DataSet"],"count":3,"requestId":"1867493731@qtp-262860041-0 - 82d43a27-7c34-4573-85d1-a01525705091"}

 * List the instances for a given type
   curl -v http://localhost:21000/api/atlas/entities?type=hive_table
   {"requestId":"788558007@qtp-44808654-5","list":["cb9b5513-c672-42cb-8477-b8f3e537a162","ec985719-a794-4c98-b98f-0509bd23aac0","48998f81-f1d3-45a2-989a-223af5c1ed6e","a54b386e-c759-4651-8779-a099294244c4"]}

   curl -v http://localhost:21000/api/atlas/entities/list/hive_db

 * Search for entities (instances) in the repository
   curl -v http://localhost:21000/api/atlas/discovery/search/dsl?query="from hive_table"
 </verbatim>


 *Dashboard*

 Once atlas is started, you can view the status of atlas entities using the Web-based
 dashboard. \You can open your browser at the corresponding port to use the web UI.


 *Stopping Atlas Server*
 <verbatim>
 bin/atlas_stop.py
 </verbatim>
	---++ Building & Installing Apache Atlas

	---+++ Building Atlas

	<verbatim>
	git clone https://git-wip-us.apache.org/repos/asf/incubator-atlas.git atlas

	cd atlas

	export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=256m" && mvn clean install
	</verbatim>

	Once the build successfully completes, artifacts can be packaged for deployment.

	<verbatim>

	mvn clean package -Pdist

	</verbatim>

	Tar can be found in atlas/distro/target/apache-atlas-${project.version}-bin.tar.gz

	Tar is structured as follows

	<verbatim>

	\|- bin
	\|- atlas_start.py
	\|- atlas_stop.py
	\|- atlas_config.py
	\|- quick_start.py
	\|- cputil.py
	\|- conf
	\|- application.properties
	\|- client.properties
	\|- atlas-env.sh
	\|- log4j.xml
	\|- solr
	\|- currency.xml
	\|- lang
	\|- stopwords_en.txt
	\|- protowords.txt
	\|- schema.xml
	\|- solrconfig.xml
	\|- stopwords.txt
	\|- synonyms.txt
	\|- docs
	\|- server
	\|- webapp
	\|- atlas.war
	\|- README
	\|- NOTICE.txt
	\|- LICENSE.txt
	\|- DISCLAIMER.txt
	\|- CHANGES.txt

	</verbatim>

	---+++ Installing & Running Atlas

	Installing Atlas
	<verbatim>
	tar -xzvf apache-atlas-${project.version}-bin.tar.gz
	* cd atlas-${project.version}
	</verbatim>

	Configuring Atlas

	By default config directory used by Atlas is {package dir}/conf. To override this set environment
	variable METADATA_CONF to the path of the conf dir.

	atlas-env.sh has been added to the Atlas conf. This file can be used to set various environment
	variables that you need for you services. In addition you can set any other environment
	variables you might need. This file will be sourced by atlas scripts before any commands are
	executed. The following environment variables are available to set.

	<verbatim>
	# The java implementation to use. If JAVA_HOME is not found we expect java and jar to be in path
	#export JAVA_HOME=

	# any additional java opts you want to set. This will apply to both client and server operations
	#export METADATA_OPTS=

	# any additional java opts that you want to set for client only
	#export METADATA_CLIENT_OPTS=

	# java heap size we want to set for the client. Default is 1024MB
	#export METADATA_CLIENT_HEAP=

	# any additional opts you want to set for atlas service.
	#export METADATA_SERVER_OPTS=

	# java heap size we want to set for the atlas server. Default is 1024MB
	#export METADATA_SERVER_HEAP=

	# What is is considered as atlas home dir. Default is the base locaion of the installed software
	#export METADATA_HOME_DIR=

	# Where log files are stored. Defatult is logs directory under the base install location
	#export METADATA_LOG_DIR=

	# Where pid files are stored. Defatult is logs directory under the base install location
	#export METADATA_PID_DIR=

	# where the atlas titan db data is stored. Defatult is logs/data directory under the base install location
	#export METADATA_DATA_DIR=

	# Where do you want to expand the war file. By Default it is in /server/webapp dir under the base install dir.
	#export METADATA_EXPANDED_WEBAPP_DIR=
	</verbatim>


	NOTE for Mac OS users
	<verbatim>
	If you are using a Mac OS, you will need to configure the METADATA_SERVER_OPTS (explained above).

	In {package dir}/conf/atlas-env.sh uncomment the following line
	#export METADATA_SERVER_OPTS=

	and change it to look as below
	export METADATA_SERVER_OPTS="-Djava.awt.headless=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
	</verbatim>

	* Hbase as the Storage Backend for the Graph Repository

	By default, Atlas uses Titan as the graph repository and is the only graph repository implementation available currently.
	The HBase versions currently supported are 1.1.x. For configuring ATLAS graph persistence on HBase, please go through the "Configuration - Graph persistence engine - HBase" section
	for more details.

	Pre-requisites for running HBase as a distributed cluster
	* 3 or 5 ZooKeeper nodes
	* Atleast 3 RegionServer nodes. It would be ideal to run the DataNodes on the same hosts as the Region servers for data locality.

	* Configuring SOLR as the Indexing Backend for the Graph Repository

	By default, Atlas uses Titan as the graph repository and is the only graph repository implementation available currently.
	For configuring Titan to work with Solr, please follow the instructions below
	<verbatim>
	* Install solr if not already running. The version of SOLR supported is 5.2.1. Could be installed from http://archive.apache.org/dist/lucene/solr/5.2.1/solr-5.2.1.tgz

	* Start solr in cloud mode.
	SolrCloud mode uses a ZooKeeper Service as a highly available, central location for cluster management.
	For a small cluster, running with an existing ZooKeeper quorum should be fine. For larger clusters, you would want to run separate multiple ZooKeeper quorum with atleast 3 servers.
	Note: Atlas currently supports solr in "cloud" mode only. "http" mode is not supported. For more information, refer solr documentation - https://cwiki.apache.org/confluence/display/solr/SolrCloud

	* For e.g., to bring up a Solr node listening on port 8983 on a machine, you can use the command:
	<verbatim>
	$SOLR_HOME/bin/solr start -c -z <zookeeper_host:port> -p 8983
	</verbatim>

	* Run the following commands from SOLR_HOME directory to create collections in Solr corresponding to the indexes that Atlas uses. In the case that the ATLAS and SOLR instance are on 2 different hosts,
	first copy the required configuration files from ATLAS_HOME/conf/solr on the ATLAS instance host to the Solr instance host. SOLR_CONF in the below mentioned commands refer to the directory where the solr configuration files
	have been copied to on Solr host:

	bin/solr create -c vertex_index -d SOLR_CONF -shards #numShards -replicationFactor #replicationFactor
	bin/solr create -c edge_index -d SOLR_CONF -shards #numShards -replicationFactor #replicationFactor
	bin/solr create -c fulltext_index -d SOLR_CONF -shards #numShards -replicationFactor #replicationFactor

	Note: If numShards and replicationFactor are not specified, they default to 1 which suffices if you are trying out solr with ATLAS on a single node instance.
	Otherwise specify numShards according to the number of hosts that are in the Solr cluster and the maxShardsPerNode configuration.
	The number of shards cannot exceed the total number of Solr nodes in your SolrCloud cluster.

	The number of replicas (replicationFactor) can be set according to the redundancy required.

	* Change ATLAS configuration to point to the Solr instance setup. Please make sure the following configurations are set to the below values in ATLAS_HOME//conf/application.properties
	atlas.graph.index.search.backend=solr5
	atlas.graph.index.search.solr.mode=cloud
	atlas.graph.index.search.solr.zookeeper-url=<the ZK quorum setup for solr as comma separated value> eg: 10.1.6.4:2181,10.1.6.5:2181

	* Restart Atlas
	</verbatim>

	For more information on Titan solr configuration , please refer http://s3.thinkaurelius.com/docs/titan/0.5.4/solr.htm

	Pre-requisites for running Solr in cloud mode
	* Memory - Solr is both memory and CPU intensive. Make sure the server running Solr has adequate memory, CPU and disk.
	Solr works well with 32GB RAM. Plan to provide as much memory as possible to Solr process
	* Disk - If the number of entities that need to be stored are large, plan to have at least 500 GB free space in the volume where Solr is going to store the index data
	* SolrCloud has support for replication and sharding. It is highly recommended to use SolrCloud with at least two Solr nodes running on different servers with replication enabled.
	If using SolrCloud, then you also need ZooKeeper installed and configured with 3 or 5 ZooKeeper nodes

	Starting Atlas Server
	<verbatim>
	bin/atlas_start.py [-port <port>]
	</verbatim>

	By default,
	* To change the port, use -port option.
	* atlas server starts with conf from {package dir}/conf. To override this (to use the same conf
	with multiple atlas upgrades), set environment variable METADATA_CONF to the path of conf dir

	Using Atlas
	<verbatim>
	* Quick start model - sample model and data
	bin/quick_start.py [<atlas endpoint>]

	* Verify if the server is up and running
	curl -v http://localhost:21000/api/atlas/admin/version
	{"Version":"v0.1"}

	* List the types in the repository
	curl -v http://localhost:21000/api/atlas/types
	{"results":["Process","Infrastructure","DataSet"],"count":3,"requestId":"1867493731@qtp-262860041-0 - 82d43a27-7c34-4573-85d1-a01525705091"}

	* List the instances for a given type
	curl -v http://localhost:21000/api/atlas/entities?type=hive_table
	{"requestId":"788558007@qtp-44808654-5","list":["cb9b5513-c672-42cb-8477-b8f3e537a162","ec985719-a794-4c98-b98f-0509bd23aac0","48998f81-f1d3-45a2-989a-223af5c1ed6e","a54b386e-c759-4651-8779-a099294244c4"]}

	curl -v http://localhost:21000/api/atlas/entities/list/hive_db

	* Search for entities (instances) in the repository
	curl -v http://localhost:21000/api/atlas/discovery/search/dsl?query="from hive_table"
	</verbatim>


	Dashboard

	Once atlas is started, you can view the status of atlas entities using the Web-based
	dashboard. \You can open your browser at the corresponding port to use the web UI.


	Stopping Atlas Server
	<verbatim>
	bin/atlas_stop.py
	</verbatim>