| .. Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information# |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| http://www.apache.org/licenses/LICENSE-2.0 |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| |
| |
| TroubleShooting |
| =============== |
| |
| Working with Server Logs |
| ------------------------ |
| |
| The CloudStack Management Server logs all web site, middle tier, and |
| database activities for diagnostics purposes in |
| `/var/log/cloudstack/management/`. The CloudStack logs a variety of error |
| messages. We recommend this command to find the problematic output in |
| the Management Server log:. |
| |
| .. note:: |
| When copying and pasting a command, be sure the command has pasted as a |
| single line before executing. Some document viewers may introduce |
| unwanted line breaks in copied text. |
| |
| .. code:: bash |
| |
| grep -i -E 'exception|unable|fail|invalid|leak|warn|error' /var/log/cloudstack/management/management-server.log |
| |
| The CloudStack processes requests with a Job ID. If you find an error in |
| the logs and you are interested in debugging the issue you can grep for |
| this job ID in the management server log. For example, suppose that you |
| find the following ERROR message: |
| |
| .. code:: bash |
| |
| 2010-10-04 13:49:32,595 ERROR [cloud.vm.UserVmManagerImpl] (Job-Executor-11:job-1076) Unable to find any host for [User|i-8-42-VM-untagged] |
| |
| Note that the job ID is 1076. You can track back the events relating to |
| job 1076 with the following grep: |
| |
| .. code:: bash |
| |
| grep "job-1076)" management-server.log |
| |
| The CloudStack Agent Server logs its activities in `/var/log/cloudstack/agent/`. |
| |
| |
| Data Loss on Exported Primary Storage |
| ------------------------------------- |
| |
| Symptom |
| ~~~~~~~ |
| |
| Loss of existing data on primary storage which has been exposed as a |
| Linux NFS server export on an iSCSI volume. |
| |
| |
| Cause |
| ~~~~~ |
| |
| It is possible that a client from outside the intended pool has mounted |
| the storage. When this occurs, the LVM is wiped and all data in the |
| volume is lost |
| |
| |
| Solution |
| ~~~~~~~~ |
| |
| When setting up LUN exports, restrict the range of IP addresses that are |
| allowed access by specifying a subnet mask. For example: |
| |
| .. code:: bash |
| |
| echo “/export 192.168.1.0/24(rw,async,no_root_squash,no_subtree_check)” > /etc/exports |
| |
| Adjust the above command to suit your deployment needs. |
| |
| |
| More Information |
| ~~~~~~~~~~~~~~~~ |
| |
| See the export procedure in the "Secondary Storage" section of the |
| CloudStack Installation Guide |
| |
| |
| Recovering a Lost Virtual Router |
| -------------------------------- |
| |
| Symptom |
| ~~~~~~~ |
| |
| A virtual router is running, but the host is disconnected. A virtual |
| router no longer functions as expected. |
| |
| |
| Cause |
| ~~~~~ |
| |
| The Virtual router is lost or down. |
| |
| |
| Solution |
| ~~~~~~~~ |
| |
| If you are sure that a virtual router is down forever, or no longer |
| functions as expected, destroy it. You must create one afresh while |
| keeping the backup router up and running (it is assumed this is in a |
| redundant router setup): |
| |
| - Force stop the router. Use the stopRouter API with forced=true |
| parameter to do so. |
| |
| - Before you continue with destroying this router, ensure that the |
| backup router is running. Otherwise the network connection will be |
| lost. |
| |
| - Destroy the router by using the destroyRouter API. |
| |
| Recreate the missing router by using the restartNetwork API with |
| cleanup=false parameter. For more information about redundant router |
| setup, see Creating a New Network Offering. |
| |
| For more information about the API syntax, see the API Reference at |
| `https://cloudstack.apache.org/api.html <https://cloudstack.apache.org/api.html>`_. |
| |
| |
| Maintenance mode not working on vCenter |
| --------------------------------------- |
| |
| Symptom |
| ~~~~~~~ |
| |
| Host was placed in maintenance mode, but still appears live in vCenter. |
| |
| |
| Cause |
| ~~~~~~ |
| |
| The CloudStack administrator UI was used to place the host in scheduled |
| maintenance mode. This mode is separate from vCenter's maintenance mode. |
| |
| |
| Solution |
| ~~~~~~~~ |
| |
| Use vCenter to place the host in maintenance mode. |
| |
| |
| Unable to deploy VMs from uploaded vSphere template |
| --------------------------------------------------- |
| |
| Symptom |
| ~~~~~~~~ |
| |
| When attempting to create a VM, the VM will not deploy. |
| |
| |
| Cause |
| ~~~~~ |
| |
| If the template was created by uploading an OVA file that was created |
| using vSphere Client, it is possible the OVA contained an ISO image. If |
| it does, the deployment of VMs from the template will fail. |
| |
| |
| Solution |
| ~~~~~~~~ |
| |
| Remove the ISO and re-upload the template. |
| |
| |
| Unable to power on virtual machine on VMware |
| -------------------------------------------- |
| |
| Symptom |
| ~~~~~~~ |
| |
| Virtual machine does not power on. You might see errors like: |
| |
| - Unable to open Swap File |
| |
| - Unable to access a file since it is locked |
| |
| - Unable to access Virtual machine configuration |
| |
| |
| Cause |
| ~~~~~ |
| |
| A known issue on VMware machines. ESX hosts lock certain critical |
| virtual machine files and file systems to prevent concurrent changes. |
| Sometimes the files are not unlocked when the virtual machine is powered |
| off. When a virtual machine attempts to power on, it can not access |
| these critical files, and the virtual machine is unable to power on. |
| |
| |
| Solution |
| ~~~~~~~~ |
| |
| See the following: |
| |
| `VMware Knowledge Base Article |
| <http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=10051/>`_ |
| |
| |
| Load balancer rules fail after changing network offering |
| -------------------------------------------------------- |
| |
| Symptom |
| ~~~~~~~ |
| |
| After changing the network offering on a network, load balancer rules |
| stop working. |
| |
| |
| Cause |
| ~~~~~ |
| |
| Load balancing rules were created while using a network service offering |
| that includes an external load balancer device such as NetScaler, and |
| later the network service offering changed to one that uses the |
| CloudStack virtual router. |
| |
| |
| Solution |
| ~~~~~~~~ |
| |
| Create a firewall rule on the virtual router for each of your existing |
| load balancing rules so that they continue to function. |
| |
| |
| Troubleshooting Internet Traffic |
| -------------------------------- |
| |
| Below are a few troubleshooting steps to check whats going wrong with your |
| network... |
| |
| |
| Trouble Shooting Steps |
| ~~~~~~~~~~~~~~~~~~~~~~ |
| |
| #. The switches have to be configured correctly to pass VLAN traffic. You can |
| verify if VLAN traffic is working by bringing up a tagged interface on the |
| hosts and pinging between them as below... |
| |
| On *host1 (kvm1)* |
| |
| :: |
| |
| kvm1 ~$ vconfig add eth0 64 |
| kvm1 ~$ ifconfig eth0.64 1.2.3.4 netmask 255.255.255.0 up |
| kvm1 ~$ ping 1.2.3.5 |
| |
| On *host2 (kvm2)* |
| |
| :: |
| |
| kvm2 ~$ vconfig add eth0 64 |
| kvm2 ~$ ifconfig eth0.64 1.2.3.5 netmask 255.255.255.0 up |
| kvm2 ~$ ping 1.2.3.4 |
| |
| If the pings dont work, run *tcpdump(8)* all over the place to check |
| who is gobbling up the packets. Ultimately, if the switches are not |
| configured correctly, CloudStack networking wont work so fix the |
| physical networking issues before you proceed to the next steps |
| |
| #. Ensure `Traffic Labels <http://cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.2.0/html/Installation_Guide/about-physical-networks.html>`_ are set for the Zone. |
| |
| Traffic labels need to be set for all hypervisors including |
| XenServer, KVM and VMware types. You can configure traffic labels when |
| you creating a new zone from the *Add Zone Wizard*. |
| |
| .. image:: /_static/images/networking-zone-traffic-labels.png |
| |
| On an existing zone, you can modify the traffic labels by going to |
| *Infrastructure, Zones, Physical Network* tab. |
| |
| .. image:: /_static/images/networking-infra-traffic-labels.png |
| |
| List labels using *CloudMonkey* |
| |
| :: |
| |
| acs-manager ~$ cloudmonkey list traffictypes physicalnetworkid=41cb7ff6-8eb2-4630-b577-1da25e0e1145 |
| count = 4 |
| traffictype: |
| id = cd0915fe-a660-4a82-9df7-34aebf90003e |
| kvmnetworklabel = cloudbr0 |
| physicalnetworkid = 41cb7ff6-8eb2-4630-b577-1da25e0e1145 |
| traffictype = Guest |
| xennetworklabel = MGMT |
| ======================================================== |
| id = f5524b8f-6605-41e4-a982-81a356b2a196 |
| kvmnetworklabel = cloudbr0 |
| physicalnetworkid = 41cb7ff6-8eb2-4630-b577-1da25e0e1145 |
| traffictype = Management |
| xennetworklabel = MGMT |
| ======================================================== |
| id = 266bad0e-7b68-4242-b3ad-f59739346cfd |
| kvmnetworklabel = cloudbr0 |
| physicalnetworkid = 41cb7ff6-8eb2-4630-b577-1da25e0e1145 |
| traffictype = Public |
| xennetworklabel = MGMT |
| ======================================================== |
| id = a2baad4f-7ce7-45a8-9caf-a0b9240adf04 |
| kvmnetworklabel = cloudbr0 |
| physicalnetworkid = 41cb7ff6-8eb2-4630-b577-1da25e0e1145 |
| traffictype = Storage |
| xennetworklabel = MGMT |
| ========================================================= |
| |
| #. KVM traffic labels require to be named as *"cloudbr0"*, *"cloudbr2"*, |
| *"cloudbrN"* etc and the corresponding bridge must exist on the KVM |
| hosts. If you create labels/bridges with any other names, CloudStack |
| (atleast earlier versions did) seems to ignore them. CloudStack does not |
| create the physical bridges on the KVM hosts, you need to create them |
| **before** before adding the host to Cloudstack. |
| |
| :: |
| |
| kvm1 ~$ ifconfig cloudbr0 |
| cloudbr0 Link encap:Ethernet HWaddr 00:0C:29:EF:7D:78 |
| inet addr:192.168.44.22 Bcast:192.168.44.255 Mask:255.255.255.0 |
| inet6 addr: fe80::20c:29ff:feef:7d78/64 Scope:Link |
| UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 |
| RX packets:92435 errors:0 dropped:0 overruns:0 frame:0 |
| TX packets:50596 errors:0 dropped:0 overruns:0 carrier:0 |
| collisions:0 txqueuelen:0 |
| RX bytes:94985932 (90.5 MiB) TX bytes:61635793 (58.7 MiB) |
| |
| #. The Virtual Router, SSVM, CPVM *public* interface would be bridged to |
| a physical interface on the host. In the example below, *cloudbr0* is |
| the public interface and CloudStack has correctly created the virtual |
| interfaces bridge. This virtual interface to physical interface mapping |
| is done automatically by CloudStack using the traffic label settings for |
| the Zone. If you have provided correct settings and still dont have a |
| working working Internet, check the switching layer before you debug any |
| further. You can verify traffic using tcpdump on the virtual, physical |
| and bridge interfaces. |
| |
| :: |
| |
| kvm-host1 ~$ brctl show |
| bridge name bridge id STP enabled interfaces |
| breth0-64 8000.000c29ef7d78 no eth0.64 |
| vnet2 |
| cloud0 8000.fe00a9fe0219 no vnet0 |
| cloudbr0 8000.000c29ef7d78 no eth0 |
| vnet1 |
| vnet3 |
| virbr0 8000.5254008e321a yes virbr0-nic |
| |
| :: |
| |
| xenserver1 ~$ brctl show |
| bridge name bridge id STP enabled interfaces |
| xapi0 0000.e2b76d0a1149 no vif1.0 |
| xenbr0 0000.000c299b54dc no eth0 |
| xapi1 |
| vif1.1 |
| vif1.2 |
| |
| #. Pre-create labels on the XenServer Hosts. Similar to KVM bridge |
| setup, traffic labels must also be pre-created on the XenServer hosts |
| before adding them to CloudStack. |
| |
| :: |
| |
| xenserver1 ~$ xe network-list |
| uuid ( RO) : aaa-bbb-ccc-ddd |
| name-label ( RW): MGMT |
| name-description ( RW): |
| bridge ( RO): xenbr0 |
| |
| |
| #. The Internet would be accessible from both the SSVM and CPVM |
| instances by default. Their public IPs will also be directly pingable |
| from the Internet. Please note that these test would work only if your |
| switches and traffic labels are configured correctly for your |
| environment. If your SSVM/CPVM cant reach the Internet, its very |
| unlikely that the Virtual Router (VR) can also the reach the Internet |
| suggesting that its either a switching issue or incorrectly assigned |
| traffic labels. Fix the SSVM/CPVM issues before you debug VR issues. |
| |
| :: |
| |
| root@s-1-VM:~# ping -c 3 google.com |
| PING google.com (74.125.236.164): 56 data bytes |
| 64 bytes from 74.125.236.164: icmp_seq=0 ttl=55 time=26.932 ms |
| 64 bytes from 74.125.236.164: icmp_seq=1 ttl=55 time=29.156 ms |
| 64 bytes from 74.125.236.164: icmp_seq=2 ttl=55 time=25.000 ms |
| --- google.com ping statistics --- |
| 3 packets transmitted, 3 packets received, 0% packet loss |
| round-trip min/avg/max/stddev = 25.000/27.029/29.156/1.698 ms |
| |
| :: |
| |
| root@v-2-VM:~# ping -c 3 google.com |
| PING google.com (74.125.236.164): 56 data bytes |
| 64 bytes from 74.125.236.164: icmp_seq=0 ttl=55 time=32.125 ms |
| 64 bytes from 74.125.236.164: icmp_seq=1 ttl=55 time=26.324 ms |
| 64 bytes from 74.125.236.164: icmp_seq=2 ttl=55 time=37.001 ms |
| --- google.com ping statistics --- |
| 3 packets transmitted, 3 packets received, 0% packet loss |
| round-trip min/avg/max/stddev = 26.324/31.817/37.001/4.364 ms |
| |
| |
| #. The Virtual Router (VR) should also be able to reach the Internet |
| without having any Egress rules. The Egress rules only control forwarded |
| traffic and not traffic that originates on the VR itself. |
| |
| :: |
| |
| root@r-4-VM:~# ping -c 3 google.com |
| PING google.com (74.125.236.164): 56 data bytes |
| 64 bytes from 74.125.236.164: icmp_seq=0 ttl=55 time=28.098 ms |
| 64 bytes from 74.125.236.164: icmp_seq=1 ttl=55 time=34.785 ms |
| 64 bytes from 74.125.236.164: icmp_seq=2 ttl=55 time=69.179 ms |
| --- google.com ping statistics --- |
| 3 packets transmitted, 3 packets received, 0% packet loss |
| round-trip min/avg/max/stddev = 28.098/44.021/69.179/17.998 ms |
| |
| #. However, the Virtual Router's (VR) Source NAT Public IP address |
| **WONT** be reachable until appropriate Ingress rules are |
| in place. You can add *Ingress* rules under *Network, Guest Network, IP |
| Address, Firewall* setting page. |
| |
| .. image:: /_static/images/networking-ingress-rule.png |
| |
| #. The VM Instances by default wont be able to access the Internet. Add |
| Egress rules to permit traffic. |
| |
| .. image:: /_static/images/networking-egress-rule.png |
| |
| #. Some users have reported that flushing IPTables rules (or changing |
| routes) on the SSVM, CPVM or the Virtual Router makes the Internet work. |
| This is not expected behaviour and suggests that your networking |
| settings are incorrect. No IPtables/route changes are required on the |
| SSVM, CPVM or the VR. Go back and double check all your settings. |
| |
| |
| In a vast majority of the cases, the problem has turned out to be at the |
| switching layer where the L3 switches were configured incorrectly. |
| |
| This section was contibuted by Shanker Balan and was originally published on |
| `Shapeblue's blog <http://shankerbalan.net/blog/internet-not-working-on-cloudstack-vms/>`_ |
| |