XenServer Error I Hates

Sometimes I’ll reboot a linux VM and I can see it went all the way down.  The machine is off far as I can tell but never ends up rebooting.  I then try doing a shutdown via XenCenter and I get error message: “VM didn’t acknowledge the need to shut down”.

I get so angry that the error tells me the VM didn’t ACKNOWLEDGE the NEED to shut down, really?? I, the administrator just told this piece of software to shutdown and it’s telling ME that it doesn’t acknowledge my command?  Reminds of an old error message I saw in Windows 95/98, “Windows will manage these settings”.  I think to myself, am I commanding the computer, or does the computer command me?

Pretty much this GIF explains everything I feel about this error.

Citrix XenServer Upgrade From 6.1 to 6.2

This weekend I’ve updated from 6.1 to 6.2. Here are the steps I took to do the upgrade.

  1. Make sure you have upgraded your licenses to 6.2 (May require login).  They have changed to a per socket structure, for us we have two sockets per server, and was able to upgrade without paying extra.  The upgrade will warning you if you don’t have the 6.2 license, but you can continue on… just make sure you’ve got the new 6.2 license is applied to your license server.
  2. Download the XenServer-6.2.0-install-cd.iso file (May require login)
  3. I uploaded the file to a physical linux web server not part of my Xen pool
  4. Create a directory called ‘xen’ in the root of your web site
  5. Mount the ISO as a loop device:
    sudo mount -o loop XenServer-6.2.0-install-cd.iso xen/
  6. Update XenCenter.msi to the latest version. After updating XenCenter, “About XenCenter” shows my version as “6.2 (build 1377)”
  7. Backup your XenServer Pool, ssh into your pool master and run this command.  Copy into your backups on a file server
    xe pool-dump-database file-name=xenpool.backup.20140510
  8. Disable HA
  9. Shutdown any unneeded VMs, this will decrease the time it takes to preform the upgrade
  10. To preform the upgrade, in XenCenter goto Tools -> Rolling Pool Upgrade
  11. Select your pool
  12. Select ‘Automatic Mode’
  13. Obey the rules of the pre-check
  14. Run the upgrade.  I had a few issues and had to restart the rolling upgrade, which didn’t seem like an issue.  My issues were mostly VMs loosing their VDIs, which can be fixed via this article.

Citrix XenServer Error: Detaching SR – General backend error

After a power outage we ran into some errors after bring everything back up.  When trying to reconnect a CIFS share on our NetApp filer, we got the following errors.

Jan 3, 2014 10:40:10 AM Error: Repairing SR CIFS ISO library On Netapp - Unable to mount the directory specified in device configuration request
Jan 3, 2014 10:40:54 AM Error: Detaching SR 'CIFS ISO library On Netapp' from 'SOME POOL' - General backend error

At first I thought the issue was Xenserver not bring joined to the Active Directory domain. But I was wrong, the issue was the NetApp wasn’t joined to the AD domain. So make sure everyone is on the domain and you may get rid of these errors. For us, the issue was the time on the NetApp was off by 6 minutes and was causing errors when trying to join AD (time needs to be with in sync within 5 minutes).

Citrix XenServer SR Backend Failure Error Codes

SRInUse

The SR device is currently in use.
16

VDIInUse
The VDI is currently in use.
24

LockErr
The lock or unlock request failed.
37

Unimplemented
The requested method is not supported or implemented.
38

SRNotEmpty
The Storage Repository is not empty.
39

ConfigLUNMissing
The request is missing the LUNid parameter.
87

ConfigSCSIid
The SCSIid parameter is missing or incorrect.
107

ISODconfMissingLocation
‘Location’ parameter must be specified in Device Configuration.
220

ISOMustHaveISOExtension
ISO name must have .iso extension.
221

ISOMountFailure
Could not mount the directory specified in Device Configuration.
222

ISOUnmountFailure
Could not unmount the directory specified in Device Configuration.
223

ISOSharenameFailure
Could not locate the ISO sharename on the target, or the access permissions might be incorrect.
224

ISOLocationStringError
Incorrect Location string format. String must be in the format SERVER:PATH for NFS targets, or \\SERVER\PATH for CIFS targets.
225

ISOLocalPath
Invalid local path
226

InvalidArg
Invalid argument
1

BadCharacter
A bad character was detected in the dconf string.
2

InvalidDev
No such device
19

InvalidSecret
No such secret
20

SRScan
The Storage Repository scan failed.
40

SRLog
The Storage Repository log operation failed.
41

SRExists
The Storage Repository already exists.
42

VDIExists
The VDI already exists.
43

SRNoSpace
There is insufficient space.
44

VDIUnavailable
The VDI is not available.
46

SRUnavailable
The Storage Repository is not available.
47

SRUnknownType
Unknown repository type
48

SRBadXML
Malformed XML string
49

LVMCreate
Logical Volume creation error
50

LVMDelete
Logical Volume deletion error
51

LVMMount
Logical Volume mount or activate error
52

LVMUnMount
Logical Volume unmounts or deactivate error
53

LVMWrite
Logical Volume write error
54

LVMPartCreate
Logical Volume partition creation error
55

LVMPartInUse
Logical Volume partition in use
56

LVMFilesystem
Logical Volume file system creation error
57

LVMMaster
Logical Volume request must come from master.
58

LVMResize
Logical Volume resize failed.
59

LVMSize
Logical Volume invalid size
60

FileSRCreate
File Storage Repository creation error
61

FileSRRmDir
File Storage Repository failed to remove directory.
62

FileSRDelete
File Storage Repository deletion error
63

VDIRemove
Failed to remove VDI.
64

VDILoad
Failed to load VDI.
65

VDIType
Invalid VDI type
66

ISCSIDevice
ISCSI device failed to appear.
67

ISCSILogin
ISCSI login failed, verify CHAP credentials.
68

ISCSILogout
ISCSI logout failed.
69

ISCSIInitiator
Failed to set ISCSI initiator.
70

ISCSIDaemon
Failed to start ISCSI daemon.
71

NFSVersion
Required NFS server version unsupported.
72

NFSMount
NFS mount error
73

NFSUnMount
NFS unmount error
74

NFSAttached
NFS mount point already attached.
75

NFSDelete
Failed to remove NFS mount point.
76

NFSTarget
Unable to detect an NFS service on this target.
108

LVMGroupCreate
Logical Volume group creation failed.
77

VDICreate
VDI Creation failed.
78

VDISize
VDI Invalid size
79

VDIDelete
Failed to mark VDI hidden.
80

VDIClone
Failed to clone VDI.
81

VDISnapshot
Failed to snapshot VDI.
82

ISCSIDiscovery
ISCSI discovery failed.
83

ISCSIIQN
ISCSI target and received IQNs differ.
84

ISCSIDetach
ISCSI detach failed.
85

ISCSIQueryDaemon
Failed to query the ISCSI daemon.
86

NFSCreate
NFS Storage Repository creation error
88

ConfigLUNIDMissing
The request is missing the LUNid parameter.
89

ConfigDeviceMissing
The request is missing the device parameter.
90

ConfigDeviceInvalid
The device is not a valid path.
91

VolNotFound
The volume cannot be found.
92

PVSfailed
PVS failed
93

ConfigLocationMissing
The request is missing the location parameter.
94

ConfigTargetMissing
The request is missing the target parameter.
95

ConfigTargetIQNMissing
The request is missing or has an incorrect target IQN parameter.
96

ConfigISCSIIQNMissing
Unable to retrieve the host configuration ISCSI IQN parameter.
97

ConfigLUNSerialMissing
The request is missing the LUN serial number.
98

LVMOneLUN
Only 1 LUN may be used with shared LVM.
99

LVMNoVolume
Cannot find volume
100

ConfigServerPathMissing
The request is missing the serverpath parameter.
101

ConfigServerMissing
The request is missing the server parameter.
102

ConfigServerPathBad
The serverpath argument is not valid.
103

LVMRefCount
Unable to open the refcount file
104

Rootdev
Root system device, cannot be used for virtual machine storage.
105

InvalidIQN
The IQN provided is an invalid format.
106

SnapshotChainTooLong
The snapshot chain is too long.
109

VDIResize
VDI resize failed.
110

APISession
Failed to initialize XMLRPC connection.
150

APILocalhost
Failed to query Local Control Domain.
151

APIPBDQuery
A failure occurred querying the PBD entries.
152

APIFailure
A failure occurred accessing an API object.
153

NAPPTarget
Netapp Target parameter is missing in Dconf string.
120

NAPPUsername
Netapp Username parameter is missing in Dconf string.
121

NAPPPassword
Netapp Password parameter is missing in Dconf string.
122

NAPPAggregate
Netapp Aggregate parameter is missing in Dconf string.
123

NAPPTargetFailed
Failed to connect to Netapp target.
124

NAPPAuthFailed
Authentication credentials incorrect
125

NAPPInsufficientPriv
Authentication credentials have insufficient access privileges.
126

NAPPFVolNum
Maximum number of flexvols reached on target. Unable to allocate the requested resource.
127

NAPPSnapLimit
Maximum number of Snapshots reached on target Volume. Unable to create the snapshot.
128

NAPPSnapNoMem
Insufficient space, unable to create the snapshot.
129

NAPPUnsupportedVersion
Netapp Target version unsupported
130

NAPPTargetIQN
Unable to retrieve target IQN
131

NAPPNoISCSIService
ISCSI service is not running on the Netapp target.
132

NAPPAsisLicense
Failed to enable A-SIS for the SR. Requires valid license on the filer.
133

NAPPAsisError
The filer does not support A-SIS on this aggregate. The license is valid however on some filers A-SIS is limited to smaller aggregates, for example, FAS3020 max supported aggregate is 1 TB. Refer to filer support documentation for details on the model. You must either disable A-SIS support, or re-configure the aggregate to the maximum supported size.
134

NAPPExclActivate
Failed to acquire an exclusive lock on the LUN.
135

DNSError
Incorrect DNS name, unable to resolve.
140

ISCSITarget
Unable to connect to ISCSI service on target
141

ISCSIPort
Incorrect value for ISCSI port must be a number between 1 and 65535.
142

BadRequest
Failed to parse the request.
143

VDIMissing
VDI could not be found.
144

EQLTarget
Equallogic Target parameter is missing in Dconf string.
160

EQLUsername
Equallogic Username parameter is missing in Dconf string.
161

EQLPassword
Equallogic Password parameter is missing in Dconf string.
162

EQLStoragePool
Equallogic StoragePool parameter is missing in Dconf string.
163

EQLConnectfail
Failed to connect to Equallogic Array; maximum SSH CLI sessions reached
164

EQLInvalidSnapReserve
Invalid snap-reserver-percentage value must be an integer indicating the amount of space as a percentage of the VDI size, to reserve for snapshots.
165

EQLInvalidSnapDepletionKey
Invalid snap-depletion value must be one of ‘delete-oldest’ or ‘volume-offline’.
166

EQLVolOutofSpace
Volume out of space, probably because to insufficient snapshot reserve allocation.
167

EQLSnapshotOfSnapshot
Cannot create Snapshot of a Snapshot VDI, operation unsupported.
168

EQLPermDenied
Failed to connect to Equallogic Array, Permission denied;username/password invalid
169

EQLUnsupportedVersion
Equallogic Target version unsupported.
170

EQLTargetPort
Unable to logon to Array. Check IP settings.
171

EQLInvalidStoragePool
Equallogic StoragePool parameter specified in Dconf string is Invalid.
172

EQLInvalidTargetIP
Equallogic Target parameter specified in Dconf string is Invalid, specify the correct Group IP address.
173

EQLInvalidSNMPResp
Invalid SNMP response received for a command line interface command.
174

EQLInvalidVolMetaData
Volume metadata stored in the ‘Description’ field is invalid. This field contains encoded data and is not user editable.
175

EQLInvalidEOFRecv
Invalid EOF response received for a CLI command.
176

LVMProvisionAttach
Volume Group out of space. The SR is over-provisioned, and out of space. Unable to grow the underlying volume to accommodate the virtual size of the disk
180

MetadataError
Error in Metadata volume operation for SR.
181

EIO
General IO error
200

EGAIN
Currently unavailable, try again
201

SMGeneral
General backend error
202

FistPoint
An active FIST point was reached that causes the process to exit abnormally.
203

CSLGConfigServerMissing
The CSLG server name or IP address is missing.
400

CSLGConfigSSIDMissing
The Storage System ID is missing.
401

CSLGConfigPoolIDMissing
The Storage Pool ID is missing.
402

CSLGProtocolCheck
The GSSI operation to the CSLG server failed.
410

CSLGLoadSR
The Storage Repository loading operation failed.
411

CSLGInvalidProtocol
An invalid storage protocol was specified
412

CSLGXMLParse
Unable to parse XML
413

CSLGProbe
Failed to probe Storage Repository.
414

CSLGSnapClone
Snapshot/Clone failed.
416

CSLGAssign
Storage assignment failed.
417

CSLGUnassign
Storage un-assignment failed.
418

CSLGAllocate
Storage allocation failed.
419

CSLGDeallocate
Storage deallocation failed.
420

CSLGHBAQuery
HBA Query failed.
421

CSLGISCSIInit
IQN/ISCSI initialization failed.
422

CSLGDeviceScan
SCSI device scan failed.
423

CSLGServer
Failed to connect to CSLG.
424

CSLGConfigSVIDMissing
The Storage Node ID is missing.
425

CSLGIntroduce
The VDI failed to be introduced to the database.
426

CSLGNotInstalled
The CSLG software does not seem to be installed.
427

CSLGPoolCreate
Failed to create multiple sub-pools from parent pool.
428

CSLGOldXML
Current XML definition is newer version.
429

MultipathdCommsFailure
Failed to communicate with the multipath daemon.

430

 

CSL_Integrated

Error in storage adapter communication

431

CSLGConfigAdapterMissing

The adapter id is missing or unknown

432

CSLGConfigUsernameMissing

The username is missing

433

CSLGConfigPasswordMissing

The password is missing

434

CSLGInvalidSSID

An invalid storage system ID was specified

435

CSLGSysInfo

Failed to collect storage system information

436

CSLGPoolInfo

Failed to collect storage pool information

437

CSLGPoolDelete

Failed to delete storage pool

438

CSLGLunInfo

Failed to collect storage volume information

439

CSLGLunList

Failed to list storage volume

440

CSLGResizeLun

Failed to resize storage volume

441

CSLGTargetPorts

Failed to list storage target ports

442

CSLGPoolList

Failed to list storage pool

443

TapdiskFailed

The tapdisk failed

444

TapdiskAlreadyRunning

The tapdisk is already running

445

CIFSExtendedCharsNotSupported

XenServer does not support extended characters in CIFS paths, usernames, passwords, and file names.

446

IllegalXMLChar

Illegal XML character.

447

Citrix XenServer 6.1 Internal error: File “xapi_xenops.ml”, line 1788, characters 3-9: Assertion failed

Today I rebooted a xen host, everything came back up as I would expect, and after trying to live migrate VMs back onto the host I got this error message.

Internal error: File "xapi_xenops.ml", line 1788, characters 3-9: Assertion failed

I thought it was something wrong with xencenter, but then tried via command line and got this error:

Error code: SR_BACKEND_FAILURE_46
Error parameters: , The VDI is not available [opterr=VDI SOME-UUID already attached RW]

 

The message “The VDI is not available” I’ve seen many times before and I knew what to do.

 

 

Citrix XenServer PCI Passthrough

Sometimes you need to use a PCI device on a virtual machine. We have a vendor who has supplied a USB dongle for software licensing, which is really annoying. Their software doesn’t merit it’s own server, so we want to put it into our Xenserver pool.

    1. Shutdown the VM you want to attach your device too.
    2. SSH into the host that has your PCI device.  Use the following command to locate the PCI ID for your device.   Should look something like this “00:1d.7″
      lspci -v
    3. Now locate the UUID for the VM you want to attach the PCI device to with the following command
      xe vm-list
    4. Now use the following command to set the pass through
      xe vm-param-set other-config:pci=0/"PCI ID" uuid="UUID of VM"
    5. Now boot the VM and check if your device is now attached.  If not, make sure you are using the correct PCI ID.  USB devices can get tricky, as there maybe many USB devices listed.

Citrix XenServer dom0 host “random” reboot

The other day I started to get paged around 12:10AM because one of our Xenserver hosts decided it was time to reboot. We have high availability (HA) turned on, so all the VMs running on this host were rebooted on other hosts per our HA config. This was good, that means paging will stop once all the servers are running again.

But what was the cause of this reboot? Of course I go straight to the logs and this is what I found in /var/log/kern.log

Nov 15 00:05:59 xenserverhostname kernel: [2461330.653319] nfs: server 10.0.0.11 not responding, timed out

Looks like issues with NFS timing out to my storage backend, which is a NetApp FAS22xx. We’ve never see performance issue with our NetApp ever, but it looks like we have something going on now. I noticed that we had a lot of volumes scheduled to run deduplication jobs starting off at midnight. I spread those out a bit so they weren’t all trying to run at the same time. I also noticed that our HA Xenserver Heartbeat was getting dedup’d as well. I turned that off because the heartbeat only takes up a few MB’s.

I also noticed that this timeout has been logged happen before, but not enough to cause a host to reboot. I believe HA/Xen will reboot the host once it goes over a timeout threshold, and that is why the server rebooted. I think we are dealing with a couple issues and I hate to use the term “perfect storm”, but it seems fitting. I think because there were a lot of NetApp jobs kicking off at midnight, jobs with lots of I/O getting kicked off on VMs at midnight, and issues with XenServer handling timeouts were at play. I think spreading out jobs on the NetApp, on the VMs, and applying patches will help, but only time will tell if it does.

http://support.citrix.com/article/CTX135623