Introduction
Several days before, an incident was assigned to me saying the end-user could not power on one of his virtual machines. Tried to power it on and an error message popped up, however, the message was not really helpful. All it said is “A general system error occurred: The virtual machine could not start”.
Symptom
As described in introduction, the error message was not telling me what the problem is. A figure is attached below.
Next thing I looked into was the vmware.log and found that there was an issue with creating swapfile.
vmware.log 2014-06-29T22:27:18.702Z| vmx| CreateVM: Swap: generating normal swap file name. 2014-06-29T22:27:18.704Z| vmx| Swap file path: '/vmfs/volumes/521439a7-bd74efb3-d915-d4ae52a522bf/aaa/aaa-aa1fa36d.vswp' 2014-06-29T22:27:18.704Z| vmx| VMXVmdb_GetDigestDiskCount: numDigestDisks = 0 2014-06-29T22:27:18.705Z| vmx| Msg_Post: Error 2014-06-29T22:27:18.705Z| vmx| [msg.dictionary.writefile.truncate] An error occurred while truncating configuration file "/vmfs/volumes/521439a7-bd74efb3-d915-d4ae52a522bf/aaa/aaa.vmx":Cannot allocate memory. 2014-06-29T22:27:18.705Z| vmx| [vob.heap.grow.max.reached] Heap vmfs3 already at its maximum size of 83887056. Cannot expand. 2014-06-29T22:27:18.705Z| vmx| [vob.heap.grow.max.reached] Heap vmfs3 already at its maximum size of 83887056. Cannot expand. 2014-06-29T22:27:18.705Z| vmx| [vob.swap.poweron.createfailure.status] Failed to create swap file '/vmfs/volumes/521439a7-bd74efb3-d915-d4ae52a522bf/shrwebprd02/shrwebprd02-aa1fa36d.vswp' : Out of memory 2014-06-29T22:27:18.705Z| vmx| [msg.vmmonVMK.creatVMFailed] Could not power on VM : Out of 2014-06-29T22:27:18.705Z| vmx| [msg.monitorLoop.createVMFailed.vmk] Failed to power on VM. 2014-06-29T22:27:18.705Z| vmx| ---------------------------------------- 2014-06-29T22:27:18.838Z| vmx| Module MonitorLoop power on failed. 2014-06-29T22:27:18.838Z| vmx| VMX_PowerOn: ModuleTable_PowerOn = 0 2014-06-29T22:27:18.840Z| vmx| Vix: [28466190 mainDispatch.c:4084]: VMAutomation_ReportPowerOpFinished: statevar=1, newAppState=1873, success=1 additionalError=0 2014-06-29T22:27:18.840Z| vmx| Vix: [28466190 mainDispatch.c:4103]: VMAutomation: Ignoring ReportPowerOpFinished because the VMX is shutting down. 2014-06-29T22:27:18.840Z| vmx| Vix: [28466190 mainDispatch.c:4084]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=1 additionalError=0 2014-06-29T22:27:18.840Z| vmx| Vix: [28466190 mainDispatch.c:4103]: VMAutomation: Ignoring ReportPowerOpFinished because the VMX is shutting down. 2014-06-29T22:27:18.840Z| vmx| Transitioned vmx/execState/val to poweredOff 2014-06-29T22:27:18.842Z| vmx| Vix: [28466190 mainDispatch.c:4084]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=0 additionalError=0 2014-06-29T22:27:18.842Z| vmx| Vix: [28466190 mainDispatch.c:4103]: VMAutomation: Ignoring ReportPowerOpFinished because the VMX is shutting down. 2014-06-29T22:27:18.842Z| vmx| VMX idle exit
To summarise the log above:
- The ESXi server tried to create a swapfile for this virtual machine to power on
- There was an error while creating swapfile due to VMFS3 HeapSize is already at its maximum size
- Failed to power-on this virtual machine
VMFS HeapSize
The ESXi server in this environment is 5.0 with Update 1. As per the KB article, the default allowed active VMDK storage per ESXi is 8TB with the default heap size 80MB. This means that if a single ESXi server has virtual machines with more than 8TB of active VMDKs, the ESXi will refuse to power-on more virtual machines.
To check how much VMDK the ESXi server has, I ran a simple PowerCLI script to check. The script and output is attached below:
foreach ($esxi in (Get-Cluster -Name “Name of Cluster” | Get-VMHost | Sort Name)) { $esxi | select Name, @{N="Sum";E={ ($esxi | Get-VM | Get-HardDisk | %{$_.CapacityGB} | Measure-Object -Sum).Sum }} }
Name,Sum ESXiA,3335.7681760788 ESXiB,3035.02425670624 ESXiC,3942.765625 ESXiD,4861 ESXiE,4538.28125 ESXiF,16272.9050750732
ESXi A to E look good but F, it’s using approximately 16TB. Checked the history of virtual machine where it was running previously and yes, it was in ESXiF.
One thing to note is that it doesn’t necessarily mean that ESXiF has 16TB of active VMDK, it could be much lesser than 16TB. However, there is a high chance of exceeding 8TB and it happened to this specific virtual machine.
Solution
The solution is quite simple, vMotioned the virtual machine to ESXiB which has the lowest VMDK size and it was happy.
Ultimately, you will want to upgrade ESXi servers to 5.5 that improved VMFS HeapSize. There is an excellent article by Cormac Horgan explaining the enhancement.
Hope this helps.