docs/tutorials/tutorial-kerberos-hadoop.md - druid - Git at Google

 ---
 id: tutorial-kerberos-hadoop
 title: "Configuring Apache Druid to use Kerberized Apache Hadoop as deep storage"
 sidebar_label: "Kerberized HDFS deep storage"
 ---

 <!--
   ~ Licensed to the Apache Software Foundation (ASF) under one
   ~ or more contributor license agreements.  See the NOTICE file
   ~ distributed with this work for additional information
   ~ regarding copyright ownership.  The ASF licenses this file
   ~ to you under the Apache License, Version 2.0 (the
   ~ "License"); you may not use this file except in compliance
   ~ with the License.  You may obtain a copy of the License at
   ~
   ~   http://www.apache.org/licenses/LICENSE-2.0
   ~
   ~ Unless required by applicable law or agreed to in writing,
   ~ software distributed under the License is distributed on an
   ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   ~ KIND, either express or implied.  See the License for the
   ~ specific language governing permissions and limitations
   ~ under the License.
   -->


 ## Hadoop Setup

 Following are the configurations files required to be copied over to Druid conf folders:

 1. For HDFS as a deep storage, hdfs-site.xml, core-site.xml
 2. For ingestion, mapred-site.xml, yarn-site.xml

 ### HDFS Folders and permissions

 1. Choose any folder name for the druid deep storage, for example 'druid'
 2. Create the folder in hdfs under the required parent folder. For example,
 `hdfs dfs -mkdir /druid`
 OR
 `hdfs dfs -mkdir /apps/druid`

 3. Give druid processes appropriate permissions for the druid processes to access this folder. This would ensure that druid is able to create necessary folders like data and indexing_log in HDFS.
 For example, if druid processes run as user 'root', then

     `hdfs dfs -chown root:root /apps/druid`

     OR

     `hdfs dfs -chmod 777 /apps/druid`

 Druid creates necessary sub-folders to store data and index under this newly created folder.

 ## Druid Setup

 Edit common.runtime.properties at conf/druid/_common/common.runtime.properties to include the HDFS properties. Folders used for the location are same as the ones used for example above.

 ### common.runtime.properties

 ```properties
 # Deep storage
 #
 # For HDFS:
 druid.storage.type=hdfs
 druid.storage.storageDirectory=/druid/segments
 # OR
 # druid.storage.storageDirectory=/apps/druid/segments

 #
 # Indexing service logs
 #

 # For HDFS:
 druid.indexer.logs.type=hdfs
 druid.indexer.logs.directory=/druid/indexing-logs
 # OR
 # druid.storage.storageDirectory=/apps/druid/indexing-logs
 ```

 Note: Comment out Local storage and S3 Storage parameters in the file

 Also include hdfs-storage core extension to `conf/druid/_common/common.runtime.properties`

 ```properties
 #
 # Extensions
 #

 druid.extensions.directory=dist/druid/extensions
 druid.extensions.hadoopDependenciesDir=dist/druid/hadoop-dependencies
 druid.extensions.loadList=["mysql-metadata-storage", "druid-hdfs-storage", "druid-kerberos"]
 ```

 ### Hadoop Jars

 Ensure that Druid has necessary jars to support the Hadoop version.

 Find the hadoop version using command, `hadoop version`

 In case there is other software used with hadoop, like `WanDisco`, ensure that
 1. the necessary libraries are available
 2. add the requisite extensions to `druid.extensions.loadlist` in `conf/druid/_common/common.runtime.properties`

 ### Kerberos setup

 Create a headless keytab which would have access to the druid data and index.

 Edit conf/druid/_common/common.runtime.properties and add the following properties:

 ```properties
 druid.hadoop.security.kerberos.principal
 druid.hadoop.security.kerberos.keytab
 ```

 For example

 ```properties
 druid.hadoop.security.kerberos.principal=hdfs-test@EXAMPLE.IO
 druid.hadoop.security.kerberos.keytab=/etc/security/keytabs/hdfs.headless.keytab
 ```

 ### Restart Druid Services

 With the above changes, restart Druid. This would ensure that Druid works with Kerberized Hadoop
	---
	id: tutorial-kerberos-hadoop
	title: "Configuring Apache Druid to use Kerberized Apache Hadoop as deep storage"
	sidebar_label: "Kerberized HDFS deep storage"
	---

	<!--
	~ Licensed to the Apache Software Foundation (ASF) under one
	~ or more contributor license agreements. See the NOTICE file
	~ distributed with this work for additional information
	~ regarding copyright ownership. The ASF licenses this file
	~ to you under the Apache License, Version 2.0 (the
	~ "License"); you may not use this file except in compliance
	~ with the License. You may obtain a copy of the License at
	~
	~ http://www.apache.org/licenses/LICENSE-2.0
	~
	~ Unless required by applicable law or agreed to in writing,
	~ software distributed under the License is distributed on an
	~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	~ KIND, either express or implied. See the License for the
	~ specific language governing permissions and limitations
	~ under the License.
	-->


	## Hadoop Setup

	Following are the configurations files required to be copied over to Druid conf folders:

	1. For HDFS as a deep storage, hdfs-site.xml, core-site.xml
	2. For ingestion, mapred-site.xml, yarn-site.xml

	### HDFS Folders and permissions

	1. Choose any folder name for the druid deep storage, for example 'druid'
	2. Create the folder in hdfs under the required parent folder. For example,
	`hdfs dfs -mkdir /druid`
	OR
	`hdfs dfs -mkdir /apps/druid`

	3. Give druid processes appropriate permissions for the druid processes to access this folder. This would ensure that druid is able to create necessary folders like data and indexing_log in HDFS.
	For example, if druid processes run as user 'root', then

	`hdfs dfs -chown root:root /apps/druid`

	OR

	`hdfs dfs -chmod 777 /apps/druid`

	Druid creates necessary sub-folders to store data and index under this newly created folder.

	## Druid Setup

	Edit common.runtime.properties at conf/druid/_common/common.runtime.properties to include the HDFS properties. Folders used for the location are same as the ones used for example above.

	### common.runtime.properties

	```properties
	# Deep storage
	#
	# For HDFS:
	druid.storage.type=hdfs
	druid.storage.storageDirectory=/druid/segments
	# OR
	# druid.storage.storageDirectory=/apps/druid/segments

	#
	# Indexing service logs
	#

	# For HDFS:
	druid.indexer.logs.type=hdfs
	druid.indexer.logs.directory=/druid/indexing-logs
	# OR
	# druid.storage.storageDirectory=/apps/druid/indexing-logs
	```

	Note: Comment out Local storage and S3 Storage parameters in the file

	Also include hdfs-storage core extension to `conf/druid/_common/common.runtime.properties`

	```properties
	#
	# Extensions
	#

	druid.extensions.directory=dist/druid/extensions
	druid.extensions.hadoopDependenciesDir=dist/druid/hadoop-dependencies
	druid.extensions.loadList=["mysql-metadata-storage", "druid-hdfs-storage", "druid-kerberos"]
	```

	### Hadoop Jars

	Ensure that Druid has necessary jars to support the Hadoop version.

	Find the hadoop version using command, `hadoop version`

	In case there is other software used with hadoop, like `WanDisco`, ensure that
	1. the necessary libraries are available
	2. add the requisite extensions to `druid.extensions.loadlist` in `conf/druid/_common/common.runtime.properties`

	### Kerberos setup

	Create a headless keytab which would have access to the druid data and index.

	Edit conf/druid/_common/common.runtime.properties and add the following properties:

	```properties
	druid.hadoop.security.kerberos.principal
	druid.hadoop.security.kerberos.keytab
	```

	For example

	```properties
	druid.hadoop.security.kerberos.principal=hdfs-test@EXAMPLE.IO
	druid.hadoop.security.kerberos.keytab=/etc/security/keytabs/hdfs.headless.keytab
	```

	### Restart Druid Services

	With the above changes, restart Druid. This would ensure that Druid works with Kerberized Hadoop