| .. Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information# |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| http://www.apache.org/licenses/LICENSE-2.0 |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| |
| |
| HA for Management Server |
| ------------------------ |
| |
| The CloudStack Management Server should be deployed in a multi-node |
| configuration such that it is not susceptible to individual server |
| failures. The Management Server itself (as distinct from the MySQL |
| database) is stateless and may be placed behind a load balancer. |
| |
| Normal operation of Hosts is not impacted by an outage of all Management |
| Serves. All guest VMs will continue to work. |
| |
| When the Management Server is down, no new VMs can be created, and the |
| end user and admin UI, API, dynamic load distribution, and HA will cease |
| to work. |
| |
| .. _management-server-load-balancing: |
| |
| Management Server Load Balancing |
| -------------------------------- |
| |
| CloudStack can use a load balancer to provide a virtual IP for multiple |
| Management Servers. The administrator is responsible for creating the |
| load balancer rules for the Management Servers. The application requires |
| persistence or stickiness across multiple sessions. The following chart |
| lists the ports that should be load balanced and whether or not |
| persistence is required. |
| |
| Even if persistence is not required, enabling it is permitted. |
| |
| .. cssclass:: table-striped table-bordered table-hover |
| |
| ============== ======================== ================ ===================== |
| Source Port Destination Port Protocol Persistence Required? |
| ============== ======================== ================ ===================== |
| 80 or 443 8080 (or 20400 with AJP) HTTP (or AJP) Yes |
| 8250 8250 TCP Yes |
| 8096 8096 HTTP No |
| ============== ======================== ================ ===================== |
| |
| In addition to above settings, the administrator is responsible for |
| setting the 'host' global config value from the management server IP to |
| load balancer virtual IP address. If the 'host' value is not set to the |
| VIP for Port 8250 and one of your management servers crashes, the UI is |
| still available but the system VMs will not be able to contact the |
| management server. |
| |
| |
| Multiple Management Servers Support on agents |
| --------------------------------------------- |
| |
| In a Cloudstack environment with multiple management servers, an agent can be |
| configured, based on an algorithm, to which management server to connect to. |
| This can be useful as an internal loadbalancer or for high availability. |
| An administrator is responsible for setting the list of management servers and |
| choosing a sorting algorithm using global settings. |
| The management server is responsible for propagating the settings to the |
| connected agents (running inside of the Secondary Storage |
| Virtual Machine, Console Proxy Virtual Machine or the KVM hosts). |
| |
| The three global settings that need to be configured are the following: |
| |
| - hosts: a comma seperated list of management server IP addresses |
| - indirect.agent.lb.algorithm: The algorithm for the indirect agent LB |
| - indirect.agent.lb.check.interval: The preferred host check interval |
| for the agent's background task that checks and switches to an agent's |
| preferred host. |
| |
| These settings can be configured from the global settings page in the UI or |
| using the updateConfiguration API call. |
| |
| The indirect.agent.lb.algorithm setting supports following algorithm options: |
| |
| - static: Use the list of management server IP addresses as provided. |
| - roundrobin: Evenly spread hosts across management servers, based on the |
| host's id. |
| - shuffle: Pseudo Randomly sort the list (this is not recommended for |
| production). |
| |
| .. note:: |
| The 'static' and 'roundrobin' algorithms, strictly checks for the order as |
| expected by them, however, the 'shuffle' algorithm just checks for content |
| and not the order of the comma separate management server host addresses. |
| |
| Any changes to the global settings - `indirect.agent.lb.algorithm` and |
| `host` does not require restarting of the management server(s) and the |
| agents. A change in these global settings will be propagated to all connected |
| agents. |
| |
| The comma-separated management server list is propagated to agents in |
| following cases: |
| - An addition of an agent (including ssvm, cpvm system VMs). |
| - Connection or reconnection of an agent to a management server. |
| - After an administrator changes the 'host' and/or the |
| 'indirect.agent.lb.algorithm' global settings. |
| |
| On the agent side, the 'host' setting is saved in its properties file as: |
| `host=<comma separated addresses>@<algorithm name>`. |
| |
| From the agent's perspective, the first address in the propagated list |
| will be considered the preferred host. A new background task can be |
| activated by configuring the `indirect.agent.lb.check.interval` which is |
| a cluster level global setting from CloudStack and administrators can also |
| override this by configuring the 'host.lb.check.interval' in the |
| `agent.properties` file. |
| |
| When an agent gets a host and algorithm combination, the host specific |
| background check interval is also sent and is dynamically reconfigured |
| in the background task without need to restart agents. |
| |
| To make things more clear, consider this example: |
| Suppose an environment which has 3 management servers: A, B and C and |
| 3 KVM agents. |
| |
| Setting 'host' = 'A,B,C', agents will receive lists depending on |
| 'direct.agent.lb' value: |
| |
| 'static': Each agent will receive the list: 'A,B,C' |
| 'roundrobin': First agent receives: 'A,B,C', second agent |
| receives: 'B,C,A', third agent receives: 'C,B,A' |
| 'shuffle': Each agent will receive a list in random order. |
| |
| HA-Enabled Virtual Machines |
| --------------------------- |
| |
| The user can specify a virtual machine as HA-enabled. By default, all |
| virtual router VMs and Elastic Load Balancing VMs are automatically |
| configured as HA-enabled. When an HA-enabled VM crashes, CloudStack |
| detects the crash and restarts the VM automatically within the same |
| Availability Zone. HA is never performed across different Availability |
| Zones. CloudStack has a conservative policy towards restarting VMs and |
| ensures that there will never be two instances of the same VM running at |
| the same time. The Management Server attempts to start the VM on another |
| Host in the same cluster. |
| |
| HA features work with iSCSI or NFS primary storage. HA with local |
| storage is not supported. |
| |
| |
| Dedicated HA Hosts |
| ------------------ |
| |
| One or more hosts can be designated for use only by HA-enabled VMs that |
| are restarting due to a host failure. Setting up a pool of such |
| dedicated HA hosts as the recovery destination for all HA-enabled VMs is |
| useful to: |
| |
| - Make it easier to determine which VMs have been restarted as part of |
| the CloudStack high-availability function. If a VM is running on a |
| dedicated HA host, then it must be an HA-enabled VM whose original |
| host failed. (With one exception: It is possible for an administrator |
| to manually migrate any VM to a dedicated HA host.). |
| |
| - Keep HA-enabled VMs from restarting on hosts which may be reserved |
| for other purposes. |
| |
| The dedicated HA option is set through a special host tag when the host |
| is created. To allow the administrator to dedicate hosts to only |
| HA-enabled VMs, set the global configuration variable ha.tag to the |
| desired tag (for example, "ha\_host"), and restart the Management |
| Server. Enter the value in the Host Tags field when adding the host(s) |
| that you want to dedicate to HA-enabled VMs. |
| |
| .. note:: |
| If you set ha.tag, be sure to actually use that tag on at least one |
| host in your cloud. If the tag specified in ha.tag is not set for |
| any host in the cloud, the HA-enabled VMs will fail to restart after |
| a crash. |
| |
| |
| HA-Enabled Hosts |
| ---------------- |
| |
| The user can specify a host as HA-enabled, In the event of a host |
| failure, attemps will be made to recover the failed host by first |
| issuing some OOBM commands. If the host recovery fails the host will be |
| fenced and placed into maintenance mode. To restore the host to normal |
| operation, manual intervention would then be required. |
| |
| Out of band management is a requirement of HA-Enabled hosts and has to be |
| confiured on all intended participating hosts. |
| (see `“Out of band management” <hosts.html#out-of-band-management>`_). |
| |
| Host-HA has granular configuration on a host/cluster/zone level. In a large |
| environment, some hosts from a cluster can be HA-enabled and some not, |
| |
| Host-HA uses a state machine design to manage the operations of recovering |
| and fencing hosts. The current status of a host is reported when quering a |
| specific host. |
| |
| Timely health investigations are done on HA-Enabled hosts to monitor for |
| any failures. Specific thresholds can be set for failed investigations, |
| only when it’s exceeded, will the host transition to a different state. |
| |
| Host-HA uses both health checks and activity checks to make decisions on |
| recovering and fencing actions. Once determined that the host is in faulty |
| state (health checks failed) it runs activity checks to figure out if there is |
| any disk activity on the VMs running on the specific host. |
| |
| The HA Resource Management Service manages the check/recovery cycle including |
| periodic execution, concurrency management, persistence, back pressure and |
| clustering operations. Administrators associate a provider with a partition |
| type (e.g. KVM HA Host provider to clusters) and may override the provider on a |
| per-partition (i.e. zone, cluster, or pod) basis. The service operates on all |
| resources of the type supported by the provider contained in a partition. |
| Administrators can also enable or disable HA operations globally or on a |
| per-partition basis. |
| |
| Only one (1) HA provider per resource type may be specified for a partition. |
| Nested HA providers by resource type is not supported (e.g. a pod |
| specifying an HA resource provider for hosts and a containing cluster |
| specifying a HA resource provider for hosts). The service is designed to be |
| opt-in where by only resources with a defined provider and HA enabled will be |
| managed. |
| |
| For each resource in an HA partition, the HA Resource Management Service |
| maintains and persists an "Finite State Machine" composed of the following |
| states: |
| |
| - AVAILABLE - The feature is enabled and Host-HA is available. |
| - SUSPECT - There are health checks failing with the host. |
| - CHECKING - Activity checks are being performed. |
| - DEGRADED - The host is passing the activity check ratio and still providing |
| service to the end user, but it cannot be managed from the CloudStack |
| management server. |
| - RECOVERING - The Host-HA framework is trying to recover the host by issuing |
| OOBM jobs. |
| - RECOVERED - The Host-HA framework has recovered the host successfully. |
| - FENCING - The Host-HA framework is trying to fence the host by issuing OOBM |
| jobs. |
| - FENCED - The Host-HA framework has fenced the host successfully. |
| - DISABLED - The feature is disabled for the host. |
| - INELIGIBLE - The feature is enabled, but it cannot be managed successfully by |
| the Host-HA framework. (OOBM is possibly not configured properly) |
| |
| When HA is enabled for a partition, the HA state of all contained resources |
| will be transitioned from DISABLED to AVAILABLE. Based on the state models, the |
| following failure scenarios and their responses will be handled by the HA |
| resource management service: |
| |
| - Activity check operation fails on the resource: Provide a semantic in the |
| activity check protocol to express that an error while performing the |
| activity check and a reason for the failure (e.g. unable to access the NFS |
| mount). If the maximum number of activity check attempts has not been |
| exceeded, the activity check will be retried. |
| |
| - Slow activity check operation: After a configurable timeout, the HA resource |
| management service abandons the check. The response to this condition would |
| be the same as a failure to recover the resource. |
| |
| - Traffic flood due to a large number of resource recoveries: The HA resource |
| management service must limit the number of concurrent recovery operations |
| permitted to avoid overwhelming the management server with resource status |
| updates as recovery operations complete. |
| |
| - Processor/memory starvation due to large number of activity check |
| operations: The HA resource management service must limit the number of |
| concurrent activity check operations permitted per management server to |
| prevent checks from starving other management server activities of scarce |
| processor and/or memory resources. |
| |
| - A SUSPECT, CHECKING, or RECOVERING resource passes a health check before the |
| state action completes: The HA resource management service refreshes the HA |
| state of the resource before transition. If it does not match the expected |
| current state, the result of state action is ignored. |
| |
| For further information around the inner workings of Host HA, refer |
| to the design document at |
| `https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA |
| <https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA>`_ |
| |
| Primary Storage Outage and Data Loss |
| ------------------------------------ |
| |
| When a primary storage outage occurs the hypervisor immediately stops |
| all VMs stored on that storage device. Guests that are marked for HA |
| will be restarted as soon as practical when the primary storage comes |
| back on line. With NFS, the hypervisor may allow the virtual machines to |
| continue running depending on the nature of the issue. For example, an |
| NFS hang will cause the guest VMs to be suspended until storage |
| connectivity is restored.Primary storage is not designed to be backed |
| up. Individual volumes in primary storage can be backed up using |
| snapshots. |
| |
| |
| Secondary Storage Outage and Data Loss |
| -------------------------------------- |
| |
| For a Zone that has only one secondary storage server, a secondary |
| storage outage will have feature level impact to the system but will not |
| impact running guest VMs. It may become impossible to create a VM with |
| the selected template for a user. A user may also not be able to save |
| snapshots or examine/restore saved snapshots. These features will |
| automatically be available when the secondary storage comes back online. |
| |
| Secondary storage data loss will impact recently added user data |
| including templates, snapshots, and ISO images. Secondary storage should |
| be backed up periodically. Multiple secondary storage servers can be |
| provisioned within each zone to increase the scalability of the system. |
| |
| |
| Database High Availability |
| -------------------------- |
| |
| To help ensure high availability of the databases that store the |
| internal data for CloudStack, you can set up database replication. This |
| covers both the main CloudStack database and the Usage database. |
| Replication is achieved using the MySQL connector parameters and two-way |
| replication. Tested with MySQL 5.1 and 5.5. |
| |
| |
| How to Set Up Database Replication |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Database replication in CloudStack is provided using the MySQL |
| replication capabilities. The steps to set up replication can be found |
| in the MySQL documentation (links are provided below). It is suggested |
| that you set up two-way replication, which involves two database nodes. |
| In this case, for example, you might have node1 and node2. |
| |
| You can also set up chain replication, which involves more than two |
| nodes. In this case, you would first set up two-way replication with |
| node1 and node2. Next, set up one-way replication from node2 to node3. |
| Then set up one-way replication from node3 to node4, and so on for all |
| the additional nodes. |
| |
| References: |
| |
| - `http://dev.mysql.com/doc/refman/5.0/en/replication-howto.html <http://dev.mysql.com/doc/refman/5.0/en/replication-howto.html>`_ |
| |
| - `https://wikis.oracle.com/display/CommSuite/MySQL+High+Availability+and+Replication+Information+For+Calendar+Server <https://wikis.oracle.com/display/CommSuite/MySQL+High+Availability+and+Replication+Information+For+Calendar+Server>`_ |
| |
| |
| Configuring Database High Availability |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| To control the database high availability behavior, use the following |
| configuration settings in the file |
| /etc/cloudstack/management/db.properties. |
| |
| **Required Settings** |
| |
| Be sure you have set the following in db.properties: |
| |
| - ``db.ha.enabled``: set to true if you want to use the replication |
| feature. |
| |
| Example: ``db.ha.enabled=true`` |
| |
| - ``db.cloud.replicas``: set to a comma-delimited set of replica hosts for the |
| cloud database. This is the list of nodes set up with replication. |
| The source node is not in the list, since it is already mentioned |
| elsewhere in the properties file. |
| |
| Example: ``db.cloud.replicas=node2,node3,node4`` |
| |
| - ``db.usage.replicas``: set to a comma-delimited set of replica hosts for the |
| usage database. This is the list of nodes set up with replication. |
| The source node is not in the list, since it is already mentioned |
| elsewhere in the properties file. |
| |
| Example: ``db.usage.replicas=node2,node3,node4`` |
| |
| **Optional Settings** |
| |
| The following settings must be present in db.properties, but you are not |
| required to change the default values unless you wish to do so for |
| tuning purposes: |
| |
| - ``db.cloud.secondsBeforeRetrySource``: The number of seconds the MySQL |
| connector should wait before trying again to connect to the source |
| after the source went down. Default is 1 hour. The retry might happen |
| sooner if db.cloud.queriesBeforeRetrySource is reached first. |
| |
| Example: ``db.cloud.secondsBeforeRetrySource=3600`` |
| |
| - ``db.cloud.queriesBeforeRetrySource``: The minimum number of queries to |
| be sent to the database before trying again to connect to the source |
| after the source went down. Default is 5000. The retry might happen |
| sooner if db.cloud.secondsBeforeRetrySource is reached first. |
| |
| Example: ``db.cloud.queriesBeforeRetrySource=5000`` |
| |
| - ``db.cloud.initialTimeout``: Initial time the MySQL connector should wait |
| before trying again to connect to the source. Default is 3600. |
| |
| Example: ``db.cloud.initialTimeout=3600`` |
| |
| |
| Limitations on Database High Availability |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| The following limitations exist in the current implementation of this |
| feature. |
| |
| - Replica hosts can not be monitored through CloudStack. You will need to |
| have a separate means of monitoring. |
| |
| - Events from the database side are not integrated with the CloudStack |
| Management Server events system. |
| |
| - You must periodically perform manual clean-up of bin log files |
| generated by replication on database nodes. If you do not clean up |
| the log files, the disk can become full. |