blob: 2056207747eb1751c56b4caa6e6bc4de77842847 [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information#
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
Choosing a Deployment Architecture
==================================
The architecture used in a deployment will vary depending on the size
and purpose of the deployment. This section contains examples of
deployment architecture, including a small-scale deployment useful for
test and trial deployments and a fully-redundant large-scale setup for
production deployments.
Small-Scale Deployment
----------------------
|Small-Scale Deployment|
This diagram illustrates the network architecture of a small-scale
CloudStack deployment.
- A firewall provides a connection to the Internet. The firewall is
configured in NAT mode. The firewall forwards HTTP requests and API
calls from the Internet to the Management Server. The Management
Server resides on the management network.
- A layer-2 switch connects all physical servers and storage.
- A single NFS server functions as both the primary and secondary
storage.
- The Management Server is connected to the management network.
Large-Scale Redundant Setup
---------------------------
|Large-Scale Redundant Setup|
This diagram illustrates the network architecture of a large-scale
CloudStack deployment.
- A layer-3 switching layer is at the core of the data center. A router
redundancy protocol like VRRP should be deployed. Typically high-end
core switches also include firewall modules. Separate firewall
appliances may also be used if the layer-3 switch does not have
integrated firewall capabilities. The firewalls are configured in NAT
mode. The firewalls provide the following functions:
- Forwards HTTP requests and API calls from the Internet to the
Management Server. The Management Server resides on the management
network.
- When the cloud spans multiple zones, the firewalls should enable
site-to-site VPN such that servers in different zones can directly
reach each other.
- A layer-2 access switch layer is established for each pod. Multiple
switches can be stacked to increase port count. In either case,
redundant pairs of layer-2 switches should be deployed.
- The Management Server cluster (including front-end load balancers,
Management Server nodes, and the MySQL database) is connected to the
management network through a pair of load balancers.
- Secondary storage servers are connected to the management network.
- Each pod contains storage and computing servers. Each storage and
computing server should have redundant NICs connected to separate
layer-2 access switches.
Separate Storage Network
------------------------
In the large-scale redundant setup described in the previous section,
storage traffic can overload the management network. A separate storage
network is optional for deployments. Storage protocols such as iSCSI are
sensitive to network delays. A separate storage network ensures guest
network traffic contention does not impact storage performance.
Multi-Node Management Server
----------------------------
The CloudStack Management Server is deployed on one or more front-end
servers connected to a single MySQL database. Optionally a pair of
hardware load balancers distributes requests from the web. A backup
management server set may be deployed using MySQL replication at a
remote site to add DR capabilities.
|Multi-Node Management Server|
The administrator must decide the following.
- Whether or not load balancers will be used.
- How many Management Servers will be deployed.
- Whether MySQL replication will be deployed to enable disaster
recovery.
Multi-Site Deployment
---------------------
The CloudStack platform scales well into multiple sites through the use
of zones. The following diagram shows an example of a multi-site
deployment.
|Example Of A Multi-Site Deployment|
Data Center 1 houses the primary Management Server as well as zone 1.
The MySQL database is replicated in real time to the secondary
Management Server installation in Data Center 2.
|Separate Storage Network|
This diagram illustrates a setup with a separate storage network. Each
server has four NICs, two connected to pod-level network switches and
two connected to storage network switches.
There are two ways to configure the storage network:
- Bonded NIC and redundant switches can be deployed for NFS. In NFS
deployments, redundant switches and bonded NICs still result in one
network (one CIDR block+ default gateway address).
- iSCSI can take advantage of two separate storage networks (two CIDR
blocks each with its own default gateway). Multipath iSCSI client can
failover and load balance between separate storage networks.
|NIC Bonding And Multipath I/O|
This diagram illustrates the differences between NIC bonding and
Multipath I/O (MPIO). NIC bonding configuration involves only one
network. MPIO involves two separate networks.
Choosing a Hypervisor
---------------------
CloudStack supports many popular hypervisors. Your cloud can consist
entirely of hosts running a single hypervisor, or you can use multiple
hypervisors. Each cluster of hosts must run the same hypervisor.
You might already have an installed base of nodes running a particular
hypervisor, in which case, your choice of hypervisor has already been
made. If you are starting from scratch, you need to decide what
hypervisor software best suits your needs. A discussion of the relative
advantages of each hypervisor is outside the scope of our documentation.
However, it will help you to know which features of each hypervisor are
supported by CloudStack. The following table provides this information.
.. cssclass:: table-striped table-bordered table-hover
+----------------------------------+-----------+--------------+------------+-----+--------+------------+
| Feature | XenServer | vSphere | KVM - RHEL | LXC | HyperV | Bare Metal |
+==================================+===========+==============+============+=====+========+============+
| Network Throttling | Yes | Yes | No | No | ? | N/A |
+----------------------------------+-----------+--------------+------------+-----+--------+------------+
| Security groups in zones that use| Yes | No | Yes | Yes | ? | No |
| basic networking | | | | | | |
+----------------------------------+-----------+--------------+------------+-----+--------+------------+
| iSCSI | Yes | Yes | Yes | Yes | Yes | N/A |
+----------------------------------+-----------+--------------+------------+-----+--------+------------+
| FibreChannel | Yes | Yes | Yes | Yes | Yes | N/A |
+----------------------------------+-----------+--------------+------------+-----+--------+------------+
| Local Disk | Yes | Yes | Yes | Yes | Yes | Yes |
+----------------------------------+-----------+--------------+------------+-----+--------+------------+
| HA | Yes | Yes (Native) | Yes | ? | Yes | N/A |
+----------------------------------+-----------+--------------+------------+-----+--------+------------+
| Snapshots of local disk | Yes | Yes | Yes | ? | ? | N/A |
+----------------------------------+-----------+--------------+------------+-----+--------+------------+
| Local disk as data disk | Yes | No | Yes | Yes | Yes | N/A |
+----------------------------------+-----------+--------------+------------+-----+--------+------------+
| Work load balancing | No | DRS | No | No | ? | N/A |
+----------------------------------+-----------+--------------+------------+-----+--------+------------+
| Manual live migration of VMs from| Yes | Yes | Yes | ? | Yes | N/A |
| host to host | | | | | | |
+----------------------------------+-----------+--------------+------------+-----+--------+------------+
| Conserve management traffic IP | Yes | No | Yes | Yes | ? | N/A |
| address by using link local | | | | | | |
| network to communicate with | | | | | | |
| virtual router | | | | | | |
+----------------------------------+-----------+--------------+------------+-----+--------+------------+
Hypervisor Support for Primary Storage
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The following table shows storage options and parameters for different
hypervisors.
.. cssclass:: table-striped table-bordered table-hover
+----------------------------------+-------------+---------------+----------------+----------------+--------+
| Primary Storage Type | XenServer | vSphere | KVM - RHEL | LXC | HyperV |
+==================================+=============+===============+================+================+========+
| Format for Disks, Templates, | VHD | VMDK | QCOW2 | | VHD |
| and Snapshots | | | | | |
+----------------------------------+-------------+---------------+----------------+----------------+--------+
| iSCSI support | CLVM | VMFS | Yes via Shared | Yes via Shared | No |
| | | | Mountpoint | Mountpoint | |
+----------------------------------+-------------+---------------+----------------+----------------+--------+
| Fiber Channel support | Yes, Via | VMFS | Yes via Shared | Yes via Shared | No |
| | existing SR | | Mountpoint | Mountpoint | |
+----------------------------------+-------------+---------------+----------------+----------------+--------+
| NFS support | Yes | Yes | Yes | Yes | No |
+----------------------------------+-------------+---------------+----------------+----------------+--------+
| Local storage support | Yes | Yes | Yes | Yes | Yes |
+----------------------------------+-------------+---------------+----------------+----------------+--------+
| Storage over-provisioning | NFS | NFS and iSCSI | NFS | | No |
+----------------------------------+-------------+---------------+----------------+----------------+--------+
| SMB/CIFS | No | No | No | No | Yes |
+----------------------------------+-------------+---------------+----------------+----------------+--------+
XenServer uses a clustered LVM system to store VM images on iSCSI and
Fiber Channel volumes and does not support over-provisioning in the
hypervisor. The storage server itself, however, can support
thin-provisioning. As a result the CloudStack can still support storage
over-provisioning by running on thin-provisioned storage volumes.
KVM supports "Shared Mountpoint" storage. A shared mountpoint is a file
system path local to each server in a given cluster. The path must be
the same across all Hosts in the cluster, for example /mnt/primary1.
This shared mountpoint is assumed to be a clustered filesystem such as
OCFS2. In this case the CloudStack does not attempt to mount or unmount
the storage as is done with NFS. The CloudStack requires that the
administrator insure that the storage is available
With NFS storage, CloudStack manages the overprovisioning. In this case
the global configuration parameter storage.overprovisioning.factor
controls the degree of overprovisioning. This is independent of
hypervisor type.
Local storage is an option for primary storage for vSphere, XenServer,
and KVM. When the local disk option is enabled, a local disk storage
pool is automatically created on each host. To use local storage for the
System Virtual Machines (such as the Virtual Router), set
system.vm.use.local.storage to true in global configuration.
CloudStack supports multiple primary storage pools in a Cluster. For
example, you could provision 2 NFS servers in primary storage. Or you
could provision 1 iSCSI LUN initially and then add a second iSCSI LUN
when the first approaches capacity.
Best Practices
--------------
Deploying a cloud is challenging. There are many different technology
choices to make, and CloudStack is flexible enough in its configuration
that there are many possible ways to combine and configure the chosen
technology. This section contains suggestions and requirements about
cloud deployments.
These should be treated as suggestions and not absolutes. However, we do
encourage anyone planning to build a cloud outside of these guidelines
to seek guidance and advice on the project mailing lists.
Process Best Practices
~~~~~~~~~~~~~~~~~~~~~~
- A staging system that models the production environment is strongly
advised. It is critical if customizations have been applied to
CloudStack.
- Allow adequate time for installation, a beta, and learning the
system. Installs with basic networking can be done in hours. Installs
with advanced networking usually take several days for the first
attempt, with complicated installations taking longer. For a full
production system, allow at least 4-8 weeks for a beta to work
through all of the integration issues. You can get help from fellow
users on the cloudstack-users mailing list.
Setup Best Practices
~~~~~~~~~~~~~~~~~~~~
- Each host should be configured to accept connections only from
well-known entities such as the CloudStack Management Server or your
network monitoring software.
- Use multiple clusters per pod if you need to achieve a certain switch
density.
- Primary storage mountpoints or LUNs should not exceed 6 TB in size.
It is better to have multiple smaller primary storage elements per
cluster than one large one.
- When exporting shares on primary storage, avoid data loss by
restricting the range of IP addresses that can access the storage.
See "Linux NFS on Local Disks and DAS" or "Linux NFS on iSCSI".
- NIC bonding is straightforward to implement and provides increased
reliability.
- 10G networks are generally recommended for storage access when larger
servers that can support relatively more VMs are used.
- Host capacity should generally be modeled in terms of RAM for the
guests. Storage and CPU may be overprovisioned. RAM may not. RAM is
usually the limiting factor in capacity designs.
- (XenServer) Configure the XenServer dom0 settings to allocate more
memory to dom0. This can enable XenServer to handle larger numbers of
virtual machines. We recommend 2940 MB of RAM for XenServer dom0. For
instructions on how to do this, see
`http://support.citrix.com/article/CTX126531
<http://support.citrix.com/article/CTX126531>`_.
The article refers to XenServer 5.6, but the same information applies
to XenServer 6.0.
Maintenance Best Practices
~~~~~~~~~~~~~~~~~~~~~~~~~~
- Monitor host disk space. Many host failures occur because the host's
root disk fills up from logs that were not rotated adequately.
- Monitor the total number of VM instances in each cluster, and disable
allocation to the cluster if the total is approaching the maximum
that the hypervisor can handle. Be sure to leave a safety margin to
allow for the possibility of one or more hosts failing, which would
increase the VM load on the other hosts as the VMs are redeployed.
Consult the documentation for your chosen hypervisor to find the
maximum permitted number of VMs per host, then use CloudStack global
configuration settings to set this as the default limit. Monitor the
VM activity in each cluster and keep the total number of VMs below a
safe level that allows for the occasional host failure. For example,
if there are N hosts in the cluster, and you want to allow for one
host in the cluster to be down at any given time, the total number of
VM instances you can permit in the cluster is at most (N-1) \*
(per-host-limit). Once a cluster reaches this number of VMs, use the
CloudStack UI to disable allocation to the cluster.
.. warning::
The lack of up-do-date hotfixes can lead to data corruption and lost VMs.
Be sure all the hotfixes provided by the hypervisor vendor are applied. Track
the release of hypervisor patches through your hypervisor vendor’s support
channel, and apply patches as soon as possible after they are released.
CloudStack will not track or notify you of required hypervisor patches. It is
essential that your hosts are completely up to date with the provided
hypervisor patches. The hypervisor vendor is likely to refuse to support any
system that is not up to date with patches.
.. |Small-Scale Deployment| image:: ./_static/images/small-scale-deployment.png
.. |Large-Scale Redundant Setup| image:: ./_static/images/large-scale-redundant-setup.png
.. |Multi-Node Management Server| image:: ./_static/images/multi-node-management-server.png
.. |Example Of A Multi-Site Deployment| image:: ./_static/images/multi-site-deployment.png
.. |Separate Storage Network| image:: ./_static/images/separate-storage-network.png
.. |NIC Bonding And Multipath I/O| image:: ./_static/images/nic-bonding-and-multipath-io.png