blob: e4106b1a06b8251fae7a8e49267c6703edf19572 [file] [log] [blame]
---
title: Amazon EC2 Configuration
---
Amazon Elastic Compute Cloud (EC2) is a service provided by Amazon Web Services (AWS). You can install and configure HAWQ on virtual servers provided by Amazon EC2. The following information describes some considerations when deploying a HAWQ cluster in an Amazon EC2 environment.
## <a id="topic_wqv_yfx_y5"></a>About Amazon EC2
Amazon EC2 can be used to launch as many virtual servers as you need, configure security and networking, and manage storage. An EC2 *instance* is a virtual server in the AWS cloud virtual computing environment.
EC2 instances are managed by AWS. AWS isolates your EC2 instances from other users in a virtual private cloud (VPC) and lets you control access to the instances. You can configure instance features such as operating system, network connectivity (network ports and protocols, IP addresses), access to the Internet, and size and type of disk storage.
For information about Amazon EC2, see the [EC2 User Guide](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html).
## <a id="topic_nhk_df4_2v"></a>Create and Launch HAWQ Instances
Use the *Amazon EC2 Console* to launch instances and configure, start, stop, and terminate (delete) virtual servers. When you launch a HAWQ instance, you select and configure key attributes via the EC2 Console.
### <a id="topic_amitype"></a>Choose AMI Type
An Amazon Machine Image (AMI) is a template that contains a software configuration including the operating system, application server, and applications that best suit your purpose. When configuring a HAWQ virtual instance, we recommend you use a *hardware virtualized* AMI running 64-bit Red Hat Enterprise Linux version 6.4 or 6.5 or 64-bit CentOS 6.4 or 6.5. Obtain the licenses and instances directly from the OS provider.
### <a id="topic_selcfgstorage"></a>Consider Storage
EC2 instances can be launched as either Elastic Block Store (EBS)-backed or instance store-backed.
Instance store-backed storage is generally better performing than EBS and recommended for HAWQ's large data workloads. SSD (solid state) instance store is preferred over magnetic drives.
**Note** EC2 *instance store* provides temporary block-level storage. This storage is located on disks that are physically attached to the host computer. While instance store provides high performance, powering off the instance causes data loss. Soft reboots preserve instance store data.
Virtual devices for instance store volumes for HAWQ EC2 instance store instances are named `ephemeralN` (where *N* varies based on instance type). CentOS instance store block device are named `/dev/xvdletter` (where *letter* is a lower case letter of the alphabet).
### <a id="topic_cfgplacegrp"></a>Configure Placement Group
A placement group is a logical grouping of instances within a single availability zone that together participate in a low-latency, 10 Gbps network. Your HAWQ master and segment cluster instances should support enhanced networking and reside in a single placement group (and subnet) for optimal network performance.
If your Ambari node is not a data node, locating the Ambari node instance in a subnet separate from the HAWQ master/segment placement group enables you to manage multiple HAWQ clusters from the single Ambari instance.
Amazon recommends that you use the same instance type for all instances in the placement group and that you launch all instances within the placement group at the same time.
Membership in a placement group has some implications on your HAWQ cluster. Specifically, growing the cluster over capacity may require shutting down all HAWQ instances in the current placement group and restarting the instances to a new placement group. Instance store volumes are lost in this scenario.
### <a id="topic_selinsttype"></a>Select EC2 Instance Type
An EC2 instance type is a specific combination of CPU, memory, default storage, and networking capacity.
Several instance store-backed EC2 instance types have shown acceptable performance for HAWQ nodes in development and production environments:
| Instance Type | Env | vCPUs | Memory (GB) | Disk Capacity (GB) | Storage Type |
|-------|-----|------|--------|----------|--------|
| cc2.8xlarge | Dev | 32 | 60.5 | 4 x 840 | HDD |
| d2.2xlarge | Dev | 8 | 60 | 6 x 2000 | HDD |
| d2.4xlarge | Dev/QA | 16 | 122 | 12 x 2000 | HDD |
| i2.8xlarge | Prod | 32 | 244 | 8 x 800 | SSD |
| hs1.8xlarge | Prod | 16 | 117 | 24 x 2000 | HDD |
| d2.8xlarge | Prod | 36 | 244 | 24 x 2000 | HDD |
For optimal network performance, the chosen HAWQ instance type should support EC2 enhanced networking. Enhanced networking results in higher performance, lower latency, and lower jitter. Refer to [Enhanced Networking on Linux Instances](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html) for detailed information on enabling enhanced networking in your instances.
All instance types identified in the table above support enhanced networking.
### <a id="topic_cfgnetw"></a>Configure Networking
Your HAWQ cluster instances should be in a single VPC and on the same subnet. Instances are always assigned a VPC internal IP address. This internal IP address should be used for HAWQ communication between hosts. You can also use the internal IP address to access an instance from another instance within the HAWQ VPC.
You may choose to locate your Ambari node on a separate subnet in the VPC. Both a public IP address for the instance and an Internet gateway configured for the EC2 VPC are required to access the Ambari instance from an external source and for the instance to access the Internet.
Ensure your Ambari and HAWQ master instances are each assigned a public IP address for external and internet access. We recommend you also assign an Elastic IP Address to the HAWQ master instance.
###Configure Security Groups<a id="topic_cfgsecgrp"></a>
A security group is a set of rules that control network traffic to and from your HAWQ instance. One or more rules may be associated with a security group, and one or more security groups may be associated with an instance.
To configure HAWQ communication between nodes in the HAWQ cluster, include and open the following ports in the appropriate security group for the HAWQ master and segment nodes:
| Port | Application |
|-------|-------------------------------------|
| 22 | ssh - secure connect to other hosts |
To allow access to/from a source external to the Ambari management node, include and open the following ports in an appropriate security group for your Ambari node:
| Port | Application |
|-------|-------------------------------------|
| 22 | ssh - secure connect to other hosts |
| 8080 | Ambari - HAWQ admin/config web console |
###Generate Key Pair<a id="topic_cfgkeypair"></a>
AWS uses public-key cryptography to secure the login information for your instance. You use the EC2 console to generate and name a key pair when you launch your instance.
A key pair for an EC2 instance consists of a *public key* that AWS stores, and a *private key file* that you maintain. Together, they allow you to connect to your instance securely. The private key file name typically has a `.pem` suffix.
This example logs into an into EC2 instance from an external location with the private key file `my-test.pem` as user `user1`. In this example, the instance is configured with the public IP address `192.0.2.0` and the private key file resides in the current directory.
```shell
$ ssh -i my-test.pem user1@192.0.2.0
```
##Additional HAWQ Considerations <a id="topic_mj4_524_2v"></a>
After launching your HAWQ instance, you will connect to and configure the instance. The *Instances* page of the EC2 Console lists the running instances and their associated network access information.
Before installing HAWQ, set up the EC2 instances as you would local host server machines. Configure the host operating system, configure host network information (for example, update the `/etc/hosts` file), set operating system parameters, and install operating system packages. For information about how to prepare your operating system environment for HAWQ, see [Apache HAWQ System Requirements](../requirements/system-requirements.html) and [Select HAWQ Host Machines](../install/select-hosts.html).
###Passwordless SSH Configuration<a id="topic_pwdlessssh_cc"></a>
HAWQ hosts will be configured during the installation process to use passwordless SSH for intra-cluster communications. Temporary password-based authentication must be enabled on each HAWQ host in preparation for this configuration. Password authentication is typically disabled by default in cloud images. Update the cloud configuration in `/etc/cloud/cloud.cfg` to enable password authentication in your AMI(s). Set `ssh_pwauth: True` in this file. If desired, disable password authentication after HAWQ installation by setting the property back to `False`.
##References<a id="topic_hgz_zwy_bv"></a>
Links to related Amazon Web Services and EC2 features and information.
- [Amazon Web Services](https://aws.amazon.com)
- [Amazon Machine Image \(AMI\)](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html)
- [EC2 Instance Store](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html)
- [Elastic Block Store](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html)
- [EC2 Key Pairs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html)
- [Elastic IP Address](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html)
- [Enhanced Networking on Linux Instances](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html)
- [Internet Gateways] (http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Internet_Gateway.html)
- [Subnet Public IP Addressing](http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-ip-addressing.html#subnet-public-ip)
- [Virtual Private Cloud](http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Introduction.html)