ESXi Upgrade – Virtual Machines Drop From Network

Introduction

Was in the process of upgrading ESXi servers, 5.0 to 5.5 and noticed a number of virtual machines were disconnected from network randomly.

Throughout this blog post, I will be going through:

  • Symptom
  • Workaround

 

Symptom

The cluster had 8 nodes (n+1) and it was scheduled to upgrade one ESXi server per day. On the next day, after one ESXi server was upgraded to 5.5, a few incidents were raised saying some virtual machines aren’t accessible over network.

To summarise:

  • One ESXi server was upgraded to 5.5
  • Some virtual machines disconnected from network across multiple ESXi servers
  • ESXi servers were manageable via vCenter Server
  • Virtual machines were also manageable, e.g. reconfigure virtual machine

The first method I’ve chosen for this issue was to check the ESXi logs and found interesting lines in VMKernel.log:

2014-11-07T22:07:47.202Z cpu16:1804972)Net: 1652: connected test_vm1.eth1 eth1 to vDS, portID 0x200002e
2014-11-07T22:07:47.202Z cpu16:1804972)Net: 1985: associated dvPort 513 with portID 0x200002e
2014-11-07T22:07:47.202Z cpu16:1804972)etherswitch: L2Sec_EnforcePortCompliance:247: client test_vm1.eth1 requested mac address change to 00:00:00:00:00:00 on port 0x200002e, disallowed by vswitch policy2014-11-07T22:07:47.202Z cpu16:1804972)etherswitch: L2Sec_EnforcePortCompliance:356: client test_vm1.eth1 has policy vialations on port 0x200002e. Port is blocked
2014-11-07T22:07:47.202Z cpu16:1804972)etherswitch: L2Sec_EnforcePortCompliance:247: client test_vm1.eth1 requested mac address change to 00:00:00:00:00:00 on port 0x200002e, disallowed by vswitch policy
2014-11-07T22:07:47.202Z cpu16:1804972)etherswitch: L2Sec_EnforcePortCompliance:356: client test_vm1.eth1 has policy vialations on port 0x200002e. Port is blocked
2014-11-07T22:07:47.202Z cpu16:1804972)WARNING: NetPort: 1245: failed to enable port 0x200002e: Bad parameter
2014-11-07T22:07:47.202Z cpu16:1804972)NetPort: 1427: disabled port 0x200002e
2014-11-07T22:07:47.202Z cpu16:1804972)WARNING: Net: vm 1804972: 377: cannot enable port 0x200002e: Bad parameter
2014-11-07T23:50:39.821Z cpu4:4151)Net: 2191: dissociate dvPort 513 from port 0x200002e
2014-11-07T23:50:39.821Z cpu4:4151)Net: 2195: disconnected client from port 0x200002e

While Googling, could find a KB article related to this issue which can be found hereThe problem was quite simple, ESXi 4.x or 5.0 supports only DVUplinks less than 31 characters and guess what, DVUplinks were longer than 31 characters!

Investigating the history of vMotions, the root cause of this problem was that DRS was load balancing workloads and during this process, some of virtual machines were spread out from ESXi 5.5 to ESXi 5.0 servers.

Workaround

The resolutions the above KB article suggests are to patch ESXi 5.0 to Patch 5. However, the patch needed reboot which means, it would take approximately same time as upgrading ESXi to 5.5. The better option was to upgrade ESXi 5.0 to 5.5 and not allow virtual machines running in ESXi 5.5 to move to ESXi 5.0 servers, i.e. set DRS to manual. The only problem with this was manual vMotion was required to evacuate virtual machines to place the ESXi server on maintenance.

Alternatively, it was possible to rename the DVUplinks to be less than 31 characters to benefit from fully automated DRS. But our standard was to leave the name of DVUplinks as default.

After a discussion, we came up with a workaround:

  1. Set DRS to manual
  2. Patching a ESXi server, set DRS to fully automated
  3. Place the ESXi server on maintenance mode
  4. Set DRS back to manual once maintenance mode task kicks in evacuation of virtual machines
  5. Upgrade ESXi server
  6. Repeat 2~5

In this way, I could evacuate virtual machines to other ESXi servers automatically and not worrying about virtual machines being vMotioned from 5.5 to 5.0.

Hope this helped and feel free to leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s