Introduction
We’ve been running vCenter Server version 5.1 and two vCenter Servers version 5.5 for almost two years. Managers decided to decommission vCenter Server 5.1 as it contained only a few clusters. The work I was involved in is to migrate clusters from vCenter Server 5.1 to vCenter Server 5.5. During the migration work, I noticed a few interesting behaviours and in this blog post, I will be going through:
- Issues
- Solutions
- Workarounds
Environment
The following is the production vSphere environment I worked on:
Two vCenter Servers
- Destination vCenter Server 5.5 No Update
- vcs5.5
- Source vCenter Server 5.1 No Update
- vcs5.1
2 x ESXi Servers 5.1
- esxi5.1_A
- esxi5.1_B
2 x dvSwitches 5.0 Version
- dvSwitch_VM_Network
- 2 x 10Gbe
- NIOC enabled with a custom resource pool
- It is mapped to one portgroup
- dvSwitch_iSCSI_Network
- 2 x 10Gbe
- NIOC disabled
Requirement
Same as previous migration scenarios, no outage allowed. A few packet drops are acceptable.
Risk
Migration of Software iSCSI VMKernels configured on dvSwitch_iSCSI_Network.
Mitigation
Import/Export dvSwitch_iSCSI_Network and corresponding portgroups maintaining identifiers to vcs5.5.
Issues & Solutions
Initial migration plan I came up with is the following:
- Export dvSwitch_VM_Network & dvSwitch_iSCSI_Network and import them to vCenter Server 5.5 (pre-work)
- Create a new cluster in vcs5.5, same configuration as in vcs5.1
- Disable HA & DRS on the cluster
- Disconnect & remove esxi5.1_A & esxi5.1_B from vcs5.1
- Register esxi5.1_A & esxi5.1_B to the cluster in vcs5.5
- Migrate dvSwitches, dvSwitch_VM_Network and dvSwitch_iSCSI_Network to the imported ones in vcs5.5
- Enable HA & DRS on the cluster
- Delete the cluster instance in vcs5.1
- Repeat the steps above for the rest of clusters
For step 1, because it doesn’t affect production system, I decided to do it before the change. The reason we would want to preserve original distributed switch and port group identifiers (mentioned in mitigation above) is to make sure the ESXi servers at destination vCenter Server picks up the new dvSwitch without any interruptions. Since there are iSCSI VMKernels mapped to the dvSwitch_iSCSI_Network, migration of bounded iSCSI VMKernels to another dvSwitch in live won’t be allowed. This is the main reason of preserving original identifiers. During exporting dvSwitch configuration from vCenter Server 5.1 and importing it to 5.5 vCenter Server, it caused an error with the following message:
Looking at the vCenter Log located under ProgramData… folder:
2015-01-12T15:18:39.937+13:00 [05388 error 'corevalidate' opID=2f3dc91a] [Validate::CheckLacpFeatureCapability] LACP is not supported on DVS [dvSwitch_VM_Network] 2015-01-12T15:18:39.937+13:00 [04620 info 'commonvpxLro' opID=E7F30A2D-0004F88E-7e] [VpxLRO] -- FINISH task-internal-33360670 -- -- vmodl.query.PropertyCollector.retrieveContents -- 2015-01-12T15:18:39.937+13:00 [05388 error 'dvsvpxdMoDvsManager' opID=2f3dc91a] [MoDvsManager::CreateNewEntity] Import Failed while creating DVS from Backup with key[51 4d 2d 50 93 51 73 6b-46 47 d0 fa 09 af 88 fc]. Fault:[vmodl.fault.NotSupported] 2015-01-12T15:18:39.937+13:00 [05388 error 'dvsvpxdMoDvsManager' opID=2f3dc91a] [MoDvsManager::CreateNewEntity] Import Failed while creating DVPG from Backup with key[dvportgroup-62]. Fault:[vim.fault.NotFound] 2015-01-12T15:18:39.937+13:00 [05388 error 'dvsvpxdMoDvsManager' opID=2f3dc91a] [MoDvsManager::CreateNewEntity] Import Failed while creating DVPG from Backup with key[dvportgroup-66]. Fault:[vim.fault.NotFound] 2015-01-12T15:18:39.937+13:00 [05388 error 'dvsvpxdMoDvsManager' opID=2f3dc91a] [MoDvsManager::CreateNewEntity] Import Failed for some hosts 2015-01-12T15:18:39.937+13:00 [05388 info 'commonvpxLro' opID=2f3dc91a] [VpxLRO] -- FINISH task-290085 -- -- vim.dvs.DistributedVirtualSwitchManager.importEntity -- 2015-01-12T15:18:39.937+13:00 [05388 info 'Default' opID=2f3dc91a] [VpxLRO] -- ERROR task-290085 -- -- vim.dvs.DistributedVirtualSwitchManager.importEntity: vim.fault.NotFound: --> Result: --> (vim.fault.NotFound) { --> dynamicType = <unset>, --> faultCause = (vmodl.MethodFault) null, --> faultMessage = (vmodl.LocalizableMessage) [ --> (vmodl.LocalizableMessage) { --> dynamicType = <unset>, --> key = "com.vmware.vim.vpxd.dvs.notFound.label", --> arg = (vmodl.KeyAnyValue) [ --> (vmodl.KeyAnyValue) { --> dynamicType = <unset>, --> key = "type", --> value = "DVS", --> }, --> (vmodl.KeyAnyValue) { --> dynamicType = <unset>, --> key = "value", --> value = "51 4d 2d 50 93 51 73 6b-46 47 d0 fa 09 af 88 fc", --> } --> ], --> message = <unset>, --> } --> ], --> msg = "" --> } --> Args: -->
Could find a related KB article and this was the known bug in vCenter Server 5.1 No Update. According to the resolution field, it was fixed in either vCenter Server 5.1 Update 2 or 5.5. So, I raised another change in advance to upgrade vCenter Server to 5.1 Update 2. Even though the vCenter Server was upgraded to 5.1 Update 2, no luck. Consulting VMware Support, upgrading the version of dvSwitch to 5.1 was required. Once dvSwitch was upgraded, import/export worked without a problem. After this pre-work, I was quite confident with the rest of migration work. On the day of work, I disconnected/removed esxi5.1_A from the source cluster and added to the cluster in vcs5.5. The next step was to rejoin esxi5.1_A to the dvSwitch imported in vcs5.5. Before doing this work, I was constantly pinging a few virtual machines and ESXi server to ensure there is no outage. The work was quite simple:
- Navigate to Networking View
- Right-click on dvSwitch_VM_Network and click Add Host
Ignore migrating VMKernels and VM Network, click next and finish
Yup – another issue happened. Error messages attached below:
vDS operation failed on host prod.esxi.com, Received SOAP response fault from [<cs p:00000000f3627820, TCP:prod.esxi.com:443>]: invokeHostTransactionCall An error occurred during host configuration. got (vim.fault.PlatformConfigFault) exception An error occurred during host configuration.
Operation failed, diagnostics report: Unable to set network resource pools list (8) (netsched.pools.persist.nfs;netsched.pools.persist.mgmt;netsched.pools.persist.vmotion;netsched.pools.persist.vsan;netsched.pools.persist.hbr;netsched.pools.persist.iscsi;netsched.pools.persist.vm;netsched.pools.persist.ft;) to dvswitch id (48 59 2d 50 06 30 c4 39-96 74 bb 0e c1 73 fc 87); Status: Busy
Screenshot:
Investigating the log, looked like the dvSwitch imported to vcs5.5 had an issue with network resource pool, i.e. NIOC. Gotcha – NIOC custom resource wasn’t completely imported. Hence, I created one (exact same configuration as defined in vcs5.1) and mapped it to the appropriate portgroup. However, there was no luck, still had the same issue as above. I guess the configuration had to be imported instead of the user manually creating them.
Work Around
I guessed that the virtual machines using the portgroup with custom resource is causing the issue. One attempt I made was to update dvSwitch on the ESXi server on maintenance mode, i.e. virtual machines running. I was correct – if there are no virtual machines at all, dvSwitch update was successful. Once this was done, the next update required was dvSwitch_iSCSI_Network. Expected behaviour, “migration of iSCSI VMKernels can cause APD state to some LUNs”, as attached below. However, since we maintained identifiers of dvSwitch and portgroups, it was safe to continue without resolving the errors:
After the work on esxi5.1_A, migrated esxi5.1_B to vcs5.5 and placed it on maintenance mode to evacuate virtual machines to esxi5.1_A. Once vMotion was finished, updated dvSwitch and It was successful!
Final Migration Plan
The following is the final migration plan:
- Export dvSwitch_VM_Network & dvSwitch_iSCSI_Network and import them to vcs5.5 (pre-work)
- Ensure vCenter Server is 5.1 Update 2 or above on the source
- Create a new cluster in vcs5.5, same configuration as in vcs5.1
- Disable HA & DRS on the cluster in vcs5.1
- Place esxi5.1_A on maintenance mode
- Disconnect & remove esxi5.1_A vcs5.1 and register it in vcs5.5 cluster
- Re-join esxi5.1_A to dvSwitch_VM_Network and dvSwitch_iSCSI_Network in vcs5.5
- Place esxi5.1_B on maintenance mode
- Disconnect & remove esxi5.1_B vcs5.1 and register it in vcs5.5 cluster
- Exit esxi5.1_A and esxi5.1_B maintenance mode
- Enable DRS only to fully automated
- Place esxi5.1_B on maintenance mode
- Once done, Re-join esxi5.1_B to dvSwitch_VM_Network and dvSwitch_iSCSI_Network to the imported ones in vcs5.5
- Enable HA on the cluster in vcs5.5
- Delete the cluster instance in vcs5.1
- Repeat above for the rest of clusters
Recommended Post Work
I think it is a bug that I am facing, ESXi servers migrated don’t recognise current Network settings, screenshot attached below:
There is no problem with selecting portgroups on VM configuration but I found that if these ESXi servers are part of vCAC and trying to create reservations, network adapters didn’t show up 😦 I recommend you to restart vCenter Server as it fixes the issue above (do not tell me you have vCenter Heartbeat installed!).
Wrap-Up
Hope the real life migration scenario described above helps and if you want other examples, they could be found on the following:
More than welcome if you’ve got any questions or clarifications.