blob: b16eea373feb8b156825d1ef2a542f21b3110ea9 [file] [log] [blame]
////
/**
* @@@ START COPYRIGHT @@@
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*
* @@@ END COPYRIGHT @@@
*/
////
[[introduction]]
= Introduction
{project-name} is a Hadoop add-on service that provides transactional SQL on top of HBase. Typically, you
use {project-name} as the database for applications that require Online Transaction Processing (OLTP),
Operational Data Store (ODS), and/or strong reporting capabilities. You access {project-name} using
standard JDBC and ODBC APIs.
You may choose whether to add {project-name} to an existing Hadoop environment or to create a standalone
Hadoop environment specifically for Hadoop.
This guide assumes that a Hadoop environment exists upon which your provisioning {project-name}. Refer to
<<requirements-hadoop-software,Hadoop Software>> for information about what Hadoop software is required
{project-name}.
[[introduction-security-considerations]]
== Security Considerations
The following users and principals need be considered for {project-name}:
* *Provisioning User*: A Linux-level user that performs the {project-name} provisioning tasks. This user ID
requires `sudo` access and passwordless ssh among the nodes where {project-name} is installed. In addition,
this user ID requires access to Hadoop distribution, HDFS, and HBase administrative users to change
respective environment's configuration settings per {project-name} requirements. Refer to
<<requirements-trafodion-provisioning-user,{project-name} Provisioning User>> for more information
about the requirements and usage associated with this user ID.
* *Runtime User*: A Linux-level user under which the {project-name} software runs, default name is `trafodion`. This user ID must be registered
as a user in the Hadoop Distributed File System (HDFS) to store and access objects in HDFS, HBase, and Hive.
In addition, this user ID requires passwordless access among the nodes where {project-name} is installed.
Refer to <<requirements-trafodion-runtime-user,{project-name} Runtime User>> for more information about this user ID.
* *{project-name} Database Users*: {project-name} users are managed by {project-name} security features (grant, revoke, etc.),
which can be integrated with LDAP if so desired. These users are referred to as *database users* and
do not have direct access to the operating system. Refer to <<enable-security-ldap,LDAP>> for
details on enabling LDAP for authenticating database users.
Refer to {docs-url}/sql_reference/index.html#register_user_statement[Register User],
{docs-url}/sql_reference/index.html#grant_statement[Grant], and other SQL statements
in the {docs-url}/sql_reference/index.html[{project-name} SQL Reference Manual] for
more information about managing {project-name} Database Users.
+
+
If your environment has been provisioned with Kerberos, then the following additional information is required.
* *KDC admin principal*: {project-name} requires administrator access to Kerberos to create principals
and keytabs for the `trafodion` user, and to look-up principal names for HDFS and HBase keytabs. Refer to
<<enable-security-kerberos,Kerberos>> for more information about the requirements and usage associated with this principal.
* *HBase keytab location*: {project-name} requires administrator access to HBase to grant required privileges to the `trafodion` user. Refer to
<<enable-security-kerberos,Kerberos>> for more information about the requirements and usage associated with this keytab.
* *HDFS keytab location*: {project-name} requires administrator access to HDFS to create directories that store files needed to perform SQL requests
such as data loads and backups. Refer to
<<enable-security-kerberos,Kerberos>> for more information about the requirements and usage associated with this keytab.
+
+
If your environment is using LDAP for authentication, then the following additional information is required.
* *LDAP username for database root access*: When {project-name} is installed, it creates a predefined database user referred to as the DB\__ROOT user.
In order to connect to the database as database root, there must be a mapping between the database user DB__ROOT and an LDAP user. Refer to
<<enable-security-ldap,LDAP>> for more information about this option.
* *LDAP search user name*: {project-name} optionally requests an LDAP username and password in order to perform LDAP operations
such as LDAP search. Refer to
<<enable-security-ldap,LDAP>> for more information about this option.
[[introduction-provisioning-options]]
== Provisioning Options
{project-name} includes two options for installation: a plug-in integration with Apache Ambari and command-line installation scripts.
The Ambari integration provides support for Hortonworks Hadoop distributions, while the command-line {project-name} Installer
supports Cloudera and Hortonworks Hadoop distributions, and for select vanilla Hadoop installations.
The {project-name} Installer supports Linux distributions SUSE and RedHat/CentOS. There are, however, some differences.
Prerequisite software packages are not installed automatically on SUSE.
The {project-name} Installer automates many of the tasks required to install/upgrade {project-name}, from downloading and
installing required software packages and making required configuration changes to your Hadoop environment via creating
the {project-name} runtime user ID to installing and starting {project-name}. It is, therefore, highly recommend that
you use the {project-name} Installer for initial installation and upgrades of {project-name}. These steps are referred to as
"Script-Based Provisioning" in this guide. Refer to <<introduction-trafodion-installer, {project-name} Installer>> that provides
usage information.
[[introduction-provisioning-activities]]
== Provisioning Activities
{project-name} provisioning is divided into the following main activities:
* *<<requirements,Requirements>>*: Activities and documentation required to install the {project-name} software.
These activities include tasks such as understanding hardware and operating system requirements,
Hadoop requirements, what software packages that need to be downloaded, configuration settings that need to be changed,
and user ID requirements.
* *<<prepare,Prepare>>*: Activities to prepare the operating system and the Hadoop ecosystem to run
{project-name}. These activities include tasks such as installing required software packages, configure
the {project-name} Installation User, gather information about the Hadoop environment, and the modify configuration
for different Hadoop services.
* *<<install,Install>>*: Activities related to installing the {project-name} software. These activities
include tasks such as unpacking the {project-name} tar files, creating the {project-name} Runtime User,
creating {project-name} HDFS directories, installing the {project-name} software, and enabling security features.
* *<<upgrade,Upgrade>>*: Activities related to the upgrading the {project-name} software. These activities
include tasks such as shutting down {project-name} and installing a new version of the {project-name} software.
The upgrade tasks vary depending on the differences between the current and new release of
{project-name}. For example, an upgrade may or may not include an upgrade of the {project-name} metadata.
* *<<activate,Activate>>*: Activities related to starting the {project-name} software. These actives
include basic management tasks such as starting and checking the status of the {project-name} components and performing basic smoke tests.
* *<<remove,Remove>>*: Activities related to removing {project-name} from your Hadoop cluster.
[[introduction-provisioning-master-node]]
== Provisioning Master Node
All provisioning tasks are performed from a single node in the cluster, which can be any node
as long as it has access to the Hadoop environment you're adding {project-name} to.
This node is referred to as the "*Provisioning Master Node*" in this guide.
The {project-name} Provisioning User must have access to all other nodes from the Provisioning
Master Node in order to perform provisioning tasks on the cluster.
[[introduction-trafodion-installer]]
== {project-name} Installer
The {project-name} Installer is a set of scripts automates most of the tasks requires to install/upgrade {project-name}.
You download the {project-name} Installer tar file from the {project-name} {download-url}[download] page.
Next, you unpack the tar file.
*Example*
```
$ mkdir $HOME/trafodion-installer
$ cd $HOME/trafodion-downloads
$ tar -zxf apache-trafodion-pyinstaller-2.2.0.tar.gz -C $HOME/trafodion-installer
$
```
<<<
The {project-name} Installer supports two different modes:
1. *Guided Setup*: Prompts for information as it works through the installation/upgrade process. This mode is recommended for new users.
2. *Automated Setup*: Required information is provided in a pre-formatted ini configuration file, which is provided
via a command argument when running the {project-name} Installer thereby suppressing all prompts. This ini configuration file only exists
on the *Provisioning Master Node*, please secure this file or delete it after you installed {project-name} successfully.
+
A template of the configuration file is available here within the installer directory: `configs/db_config_default.ini`.
Make a copy of the file in your directory and populate the needed information.
+
Automated Setup is recommended since it allows you to record the required provisioning information ahead of time.
Refer to <<introduction-trafodion-installer-automated-setup,Automated Setup>> for information about how to
populate this file.
[[introduction-trafodion-installer-usage]]
=== Usage
The following shows help for the {project-name} Installer.
```
$ ./db_install.py -h
**********************************
Trafodion Installation ToolKit
**********************************
Usage: db_install.py [options]
Trafodion install main script.
Options:
-h, --help show this help message and exit
-c FILE, --config-file=FILE
Json format file. If provided, all install prompts
will be taken from this file and not prompted for.
-u USER, --remote-user=USER
Specify ssh login user for remote server,
if not provided, use current login user as default.
-v, --verbose Verbose mode, will print commands.
--silent Do not ask user to confirm configuration result
--enable-pwd Prompt SSH login password for remote hosts.
If set, 'sshpass' tool is required.
--build Build the config file in guided mode only.
--reinstall Reinstall Trafodion without restarting Hadoop.
--apache-hadoop Install Trafodion on top of Apache Hadoop.
--offline Enable local repository for offline installing
Trafodion.
```
<<<
[[introduction-trafodion-installer-install-vs-upgrade]]
=== Install vs. Upgrade
The {project-name} Installer automatically detects whether you're performing an install
or an upgrade by looking for the {project-name} Runtime User in the `/etc/passwd` file.
* If the user ID doesn't exist, then the {project-name} Installer runs in install mode.
* If the user ID exists, then the {project-name} Installer runs in upgrade mode.
* If `--reinstall` option is specified, then the {project-name} Installer will not restart Hadoop. It's only available when
you reinstall the same release version, otherwise an error will be reported during installation.
[[introduction-trafodion-installer-guided-setup]]
=== Guided Setup
By default, the {project-name} Installer runs in Guided Setup mode, which means
that it prompts you for information during the install/upgrade process.
Refer to the following sections for examples:
* <<install-guided-install, Guided Install>>
* <<upgrade-guided-upgrade, Guided Upgrade>>
[[introduction-trafodion-installer-automated-setup]]
=== Automated Setup
The `--config-file` option runs the {project-name} in Automated Setup mode.
Before running the {project-name} Installer with this option, you do the following:
1. Copy the `db_config_default.ini` file.
+
*Example*
+
```
cp configs/db_config_default.ini my_config
```
2. Edit the new file using information you collect in the
<<prepare-gather-configuration-information,Gather Configuration Information>>
section in the <<prepare,Prepare>> chapter.
3. Run the {project-name} Installer in Automated Setup Mode
+
*Example*
+
```
./db_install.py --config-file my_config
```
NOTE: Your {project-name} Configuration File contains the password for the {project-name} Runtime User
and for the Distribution Manager. Therefore, we recommend that you secure the file in a manner
that matches the security policies of your organization.
==== Example: Quick start using a {project-name} Configuration File
The {project-name} Installer supports a minimum configuration to quick start your installation in two steps.
1. Copy {project-name} server binary file to your installer directory.
```
cp /path/to/apache-trafodion_server-2.2.0-RH-x86_64.tar.gz python-installer/
```
2. Modify configuration file `my_config`, add the Hadoop Distribution Manager URL in `mgr_url`.
```
mgr_url = 192.168.0.1:8080
```
Once completed, run the {project-name} Installer with the --config-file option.
==== Example: Creating a {project-name} Configuration File
Using the instructions in <<prepare-gather-configuration-information,Gather Configuration Information>>
in the <<prepare,Prepare>> chapter, record the information and edit `my_config` to contain the following:
```
[dbconfigs]
# NOTICE: if you are using CDH/HDP hadoop distro,
# you can only specifiy management url address for a quick install
##################################
# Common Settings
##################################
# trafodion username and password
traf_user = trafodion
traf_pwd = traf123
# trafodion user's home directory
home_dir = /home
# the directory location of trafodion binary
# if not provided, the default value will be {package_name}-{version}
traf_dirname =
# trafodion used java(JDK) path on trafodion nodes
# if not provided, installer will auto detect installed JDK
java_home =
# cloudera/ambari management url(i.e. http://192.168.0.1:7180 or just 192.168.0.1)
# if 'http' or 'https' prefix is not provided, the default one is 'http'
# if port is not provided, the default port is cloudera port '7180'
mgr_url = 192.168.0.1:8080
# user name for cloudera/ambari management url
mgr_user = admin
# password for cloudera/ambari management url
mgr_pwd = admin
# set the cluster number if multiple clusters managed by one Cloudera manager
# ignore it if only one cluster being managed
cluster_no = 1
# trafodion tar package file location
# no need to provide it if the package can be found in current installer's directory
traf_package =
# the number of dcs servers on each node
dcs_cnt_per_node = 4
# scratch file location, seperated by comma if more than one
scratch_locs = $TRAF_VAR
# start trafodion instance after installation completed
traf_start = Y
##################################
# DCS HA configuration
##################################
# set it to 'Y' if enable DCS HA
dcs_ha = N
# if HA is enabled, provide floating ip, network interface and the hostname of backup dcs master nodes
dcs_floating_ip =
# network interface that dcs used
dcs_interface =
# backup dcs master nodes, seperated by comma if more than one
dcs_backup_nodes =
##################################
# Offline installation setting
##################################
# set offline mode to Y if no internet connection
offline_mode = N
# if offline mode is set, you must provide a local repository directory with all needed RPMs
local_repo_dir =
##################################
# LDAP security configuration
##################################
# set it to 'Y' if enable LDAP security
ldap_security = N
# LDAP user name and password to be assigned as DB admin privilege
db_admin_user = admin
db_admin_pwd = traf123
# LDAP user to be assigned DB root privileges (DB__ROOT)
db_root_user = trafodion
# if LDAP security is enabled, provide the following items
ldap_hosts =
# 389 for no encryption or TLS, 636 for SSL
ldap_port = 389
ldap_identifiers =
ldap_encrypt = 0
ldap_certpath =
# set to Y if user info is needed
ldap_userinfo = N
# provide if ldap_userinfo = Y
ladp_user =
ladp_pwd =
##################################
# Kerberos security configuration
##################################
# if kerberos is enabled in your hadoop system, provide below info
# KDC server address
kdc_server =
# include realm, i.e. admin/admin@EXAMPLE.COM
admin_principal =
# admin password for admin principal, it is used to create trafodion user's principal and keytab
kdcadmin_pwd =
```
Once completed, run the {project-name} Installer with the `--config-file` option.
Refer to the following sections for examples:
* <<install-automated-install, Automated Install>>
* <<upgrade-automated-upgrade, Automated Upgrade>>
[[introduction-trafodion-provisioning-directories]]
== {project-name} Provisioning Directories
{project-name} stores its provisioning information in the following directories on each node in the cluster:
* `/etc/trafodion`: Configuration information.