Citrix XenServer 6.1 Internal error: File “xapi_xenops.ml”, line 1788, characters 3-9: Assertion failed

Today I rebooted a xen host, everything came back up as I would expect, and after trying to live migrate VMs back onto the host I got this error message.

Internal error: File "xapi_xenops.ml", line 1788, characters 3-9: Assertion failed

I thought it was something wrong with xencenter, but then tried via command line and got this error:

Error code: SR_BACKEND_FAILURE_46
Error parameters: , The VDI is not available [opterr=VDI SOME-UUID already attached RW]

 

The message “The VDI is not available” I’ve seen many times before and I knew what to do.

 

 

Citrix XenServer PCI Passthrough

Sometimes you need to use a PCI device on a virtual machine. We have a vendor who has supplied a USB dongle for software licensing, which is really annoying. Their software doesn’t merit it’s own server, so we want to put it into our Xenserver pool.

    1. Shutdown the VM you want to attach your device too.
    2. SSH into the host that has your PCI device.  Use the following command to locate the PCI ID for your device.   Should look something like this “00:1d.7″
      lspci -v
    3. Now locate the UUID for the VM you want to attach the PCI device to with the following command
      xe vm-list
    4. Now use the following command to set the pass through
      xe vm-param-set other-config:pci=0/"PCI ID" uuid="UUID of VM"
    5. Now boot the VM and check if your device is now attached.  If not, make sure you are using the correct PCI ID.  USB devices can get tricky, as there maybe many USB devices listed.

Citrix XenServer dom0 host “random” reboot

The other day I started to get paged around 12:10AM because one of our Xenserver hosts decided it was time to reboot. We have high availability (HA) turned on, so all the VMs running on this host were rebooted on other hosts per our HA config. This was good, that means paging will stop once all the servers are running again.

But what was the cause of this reboot? Of course I go straight to the logs and this is what I found in /var/log/kern.log

Nov 15 00:05:59 xenserverhostname kernel: [2461330.653319] nfs: server 10.0.0.11 not responding, timed out

Looks like issues with NFS timing out to my storage backend, which is a NetApp FAS22xx. We’ve never see performance issue with our NetApp ever, but it looks like we have something going on now. I noticed that we had a lot of volumes scheduled to run deduplication jobs starting off at midnight. I spread those out a bit so they weren’t all trying to run at the same time. I also noticed that our HA Xenserver Heartbeat was getting dedup’d as well. I turned that off because the heartbeat only takes up a few MB’s.

I also noticed that this timeout has been logged happen before, but not enough to cause a host to reboot. I believe HA/Xen will reboot the host once it goes over a timeout threshold, and that is why the server rebooted. I think we are dealing with a couple issues and I hate to use the term “perfect storm”, but it seems fitting. I think because there were a lot of NetApp jobs kicking off at midnight, jobs with lots of I/O getting kicked off on VMs at midnight, and issues with XenServer handling timeouts were at play. I think spreading out jobs on the NetApp, on the VMs, and applying patches will help, but only time will tell if it does.

http://support.citrix.com/article/CTX135623

Configure SNMP on Citrix XenServer 6.x

I love Cacti and use it all the time to monitor my system performance.  Here are the few steps needed to turn SNMP on with in your Citrix XenServer hosts.  This document assumes you already have Performance Monitoring Enhancements Pack for XenServer, SNMP, and Cacti already installed and functioning on your network.

  1. Edit: /etc/sysconfig/iptables
    Add the following lines AFTER the line “-A RH-Firewall-1-INPUT -p udp –dport 5353 -d 224.0.0.251 -j ACCEPT”
    -A RH-Firewall-1-INPUT -p udp --dport 161 -j ACCEPT
    -A RH-Firewall-1-INPUT -p udp --dport 162 -j ACCEPT
  2. Execute: service iptables restart
  3. Edit: /etc/snmp/snmpd.conf
    Replace the community with your current SNMP community if you have one.
    # sec.name source community
    com2sec notConfigUser default public
  4. Execute: chkconfig snmpd on
  5. Execute: service snmpd restart
  6. Test from another host: snmpwalk -v 2c -c public xenserver.someplace.com
    Note, replace public with your SNMP community!

Citrix XenServer Designate New Pool Master

Most of the time you don’t need to change the XenServer pool master but if need to do maintenance  it will require you to move the pool master to another host.

Make sure HA is disabled before proceeding if you are using it.  This can be done via xencenter.

xe pool-ha-disable

Find the UUID of the host you want to nominate as the new pool master

xe host-list

Use this command to set the new pool master with the UUID from step 2

xe pool-designate-new-master host-uuid=<uuid>

Re-enable HA if needed.  This can be done via xencenter as well.

xe pool-ha-enable

Citrix XenServer 6.1 – XenServer Tools out of date (version 6.0 installed)

Recently we upgraded our XenServer resource pool to 6.1 because the new feature allowing you to do live migration of VDI’s sounds unreal.  I still get amazed that this stuff works, well most of the time.  Now we are having issues with XenTools reporting to be out of date in XenCenter, I see the message:

XenServer Tools out of date (version 6.0 installed)

Great, I’ll just pop the XenTools CD in, uninstall the old version, and install the new version.  Nope, that does not work.  Instead it makes the issues worst, as you try and reboot the system but XenServer never figures out the server is shutdown and you end up having to do vm-reset-powerstate or even destroy the domain.  Either way this sucks.

Citrix just updated it’s XenServer blog yesterday acknowledging the issues with XenTools is real and they will try and fix their processes.  And they are right, this sort of thing really does hurt your confidence in the system.  And we all know virtualization is built on confidence.

Now here is the real place to get all the info on what needs to happen.  It’s a two part update, first you need to apply hot fix XS61E009 and then XS61E010.  Then you have the fun of updating XenTools.  But lots of work non the less when you have 10s, 100s, 1000s of VMs.

http://support.citrix.com/article/CTX135099
http://support.citrix.com/article/CTX136252
http://support.citrix.com/article/CTX136253

*UPDATE*

If you run in to blue screen on Windows while installing the new XenTools from the XS61E010… I know more issues??  If you can get into safe mode to remove xentools from the machine and reinstall, that may work.  Other wise you can try looking yup your VMs UUID and changing the device ID.  This may work if your VM does not get the correct device ID set during the new xentools install. Use this command:

xe vm-param-set uuid=<vm_uuid> platform:device_id=0001

or

xe vm-param-set uuid=<vm_uuid> platform:device_id=0002

Citrix XenServer System Recovery Guide

I found this white paper from the early days of XenServer.  I think it’s worth a once over as most of the information and logic remain valid.  The flow chart of recovery steps makes it look pretty simple and can maybe help someone out of a jam.  This is definitely something you want in your disaster  recovery plan!

XenServer System Recovery Guide

Orignal source: http://support.citrix.com/servlet/KbServlet/download/17140-102-671536/XenServer%20System%20Recovery%20Guide.pdf

Citrix XenServer Error: VDI is not Available

Error: Starting VM ” – The VDI is not available

So you’re now trying to boot a VM in XenServer but you are getting the error “VDI is not Available”. This means that VM crashed, Xen Host crashed, or something just bad happen. Either way you need your server back.

  1. Find the UUID of the VDI in question.
    xe vdi-list
  2. Note exactly what UUID maps to which drive is on your server.  This is going to remove the VDI from the VM so we can reattach it correctly.  So drive order does matter, you don’t want to switch an OS VDI with a data VDI.
    xe vdi-forget uuid=<VDI UUID we found in step 1>
  3. Open XenCenter and navigate to the SR with your VDI.  Hit rescan
  4. Now goto your VM with issues and attach the VDI via the storage tab
  5. Boot your VM

Install Dell Openmanage on Citrix XenServer for Nagios checks

Like any good sysadmin, you want to know if anything is happening to your Dell hardware at any given moment.  Here is what I did to get Dell Openmanage installed in Citrix XenServer 5.6, 6.0, and 6.1.    Once openmanage is installed and working, you can then have Nagios ssh into the Xenserver host and run a check (this maybe covered in another post).

  1. I now send you on a quest.  Head to the dell website and start searching for the software.  Get something named “Dell OpenManage Server Administrator Managed Node (Distribution Specific)” or also called “OpenManage Supplemental Pack” or “OpenManage Server Administrator Managed Node” or “OM-SrvAdmin-Dell-Web-LX-7.1.0-5304.XenServer60_A00.iso” or this link?
  2. Transfer the iso to your xenserver host via scp.
  3. mount –o loop <openmanange-supplemental-pack-filename>.iso /mnt
  4. cd /mnt
  5. ./install.sh
  6. /etc/init.d/dataeng start
  7. Logout and back in and this command should work:
    omreport storage pdisk controller=0
  8. /usr/sbin/useradd nagios
  9. passwd nagios
  10. cd /home/nagios
  11. mkdir .ssh
  12. Now we need to generate or install a ssh key for Nagios to login without a password.  Here is how you would generate one:
    ssh-keygen -t dsa -b 1024 -f .ssh/id_dsa
    cat .ssh/id_dsa.pub >> .ssh/authorized_keys
  13. chown -R nagios:nagios .ssh
  14. chmod 750 .ssh
  15. chmod 640 .ssh/*
  16. mkdir bin
  17. chown -R nagios:nagios bin
  18. chmod 750 bin
  19. Get the nagios check script, this will be excuted by Nagios when it logins via ssh
    wget http://folk.uio.no/trondham/software/files/check_openmanage-3.7.3.tar.gz
  20. tar -xzvf check_openmanage-3.7.3.tar.gz
  21. cp check_openmanage-3.7.3/check_openmanage bin/
  22. If you are running Xenserver 6 or higher, you will need to run this command
    chmod o+rx /
  23. Log into your Nagios server
  24. Copy ssh id_dsa/.pub to nagios server, in nagios’s ~/.ssh
  25. Test logging in without a password
  26. Setup nagios checks (I plan posting this someone day)

Helpful links:

Citrix XenServer How to Force Shutdown Virtual Machines

Sometimes you just can’t get a VM to shutdown, it maybe an issue with XenTools or sun spots. Here is a list of commands that will help you get that damn thing shutdown.

  1. Disable High Availability (HA) so you don’t run into issues
  2. Log into the Xenserver host that is running your VM with issues via ssh or console via XenCenter
  3. Run the following command to list VMs and their UUIDs
    xe vm-list
  4. First you can try just the normal shutdown command with force
    xe vm-shutdown uuid=<UUID from step 3> force=true
  5. If that just hangs, use CONTROL+C to kill it off and try to reset the power state.  The force is required on this command
    xe vm-reset-powerstate  uuid=<UUID from step 3> force=true
  6. If the VM is still not shutdown, we may need to destroy the domain
  7. Run this command to get the domain id of the VM.  It is the number in the first row of output
    list_domains
  8. Now run this command using the domain ID from the output of step 7
    Before XenServer 7.x:
    /opt/xensource/debug/xenops destroy_domain -domid <DOMID from step 7>
    XenServer 7.x and greater:
    xl destroy <DOMID from step 7>