PowerCLI Report Tip – Part 1

Introduction

For the last years, I’ve been writing a lot of PowerCLI scripts to automate repeated tasks and produce custom reports for managers to review the infrastructure. For the next 4 series of blog, I will be sharing a number tips on how to write an efficient and tidy scripts.

This will be the first series and it will discuss the ways to improve the performance.

Note: PowerCLI 5.8 Release 1 & vCenter Server 5.5 were used.

Getting Started

Let’s get started with a simple script:

$report = foreach ($vm in (Get-VM -Location (Get-Cluster -Name cluster1))) {
   $vm | Select @{N="ESXi";E={$vm.VMHost.Name}}, Name, NumCPU, MemoryGB, @{N="Datastore";E={ [string]::Join(",", (Get-Datastore -VM $vm | %{$_.Name})) }}
}

An example output is attached below:

ESXi,VM,CPU,Memory,Datastore
ESXi_test1,test1,1,1,datastore1,datastore2,datastore3
ESXi_test1,test2,1,1,datastore1
ESXi_test2,test3,1,4,datastore4,datastore5

For the testing, I selected a cluster with 300 virtual machines to measure how long it takes to run the script above and it took 5 minutes. 5 minutes does look OK, however, how about 2000~3000 virtual machines? It will take about an hour which is very inefficient.

How could we improve the performance?

Trick

Many people might think that saving a full list of output would take longer than applying a filter. For example, Get-Datastore vs Get-Datastore -VM $vm.

Is this the case? Let’s have a look:

Get-Datastore => To retrieve 565 datastores, it took 2.2 seconds
Get-Datastore -VM “VM” => It took 8 seconds

Surprising result, right? Get-Datastore without a filter is approximately 4 times faster, even it queried for 565 datastores. With this result, it was safe to assume that for the script above, it does Get-Datastore for 300 times which is roughly 300 * 8 = 2400 seconds, about 4 minutes. Sounds about right.

To improve the performance, one of the ways I could came up was to:

Save Get-Datastore output in a variable, e.g. $datastore_list = Get-Datastore
Utilise the $datastore_list to find which datastores are allocated to virtual machines

This way, instead of executing Get-Datastore 300 times, it will run Get-Datastore once, save the result in a variable and query datastore information from the variable. This does look much more efficient. However, how could we achieve this?

If you look at the properties of Get-VM closely (run Get-VM | Select * to view all properties, this will be discussed in depth on the next series), there is a property called “DatastoreIdList”. Each datastore has a unique datastore id and Get-VM has this datastore id value. This means, we could:

Run a foreach loop against DatastoreIdList
If datastore ID matches to any datastore id in $datastore_list variable, output

Translating the above into a PowerCLI command:

$datastore_list = Get-Datastore
(Get-VM -Name “VM”).DatastoreIdList | Foreach-Object { $datastore_id = $_; $datastore_list | where {$_.id -match $datastore_id} }

Converting the script in Getting Started section:

$datastore_list = Get-Datastore

$report = foreach ($vm in (Get-VM -Location (Get-Cluster -Name cluster1))) {
   $vm | Select @{N="ESXi";E={$_.VMHost.Name}}, Name, NumCPU, MemoryGB, @{N="Datastore";E={ [string]::Join(",", ( $_.DatastoreIdList | Foreach-Object { $datastore_id = $_; $datastore_list | where {$_.id -match $datastore_id} } )) }}
}

The above script took 27 seconds, producing the same output. This is approximately 20 times faster than the original one.

Wrap-Up

Throughout the blog, it discussed a few ways on how to improve the performance of PowerCLI scripts:

Instead of applying a filter, save the whole output
Avoid executing a same command over and over
Take a closer look at properties to avoid running a command

Hope this helped and on the next series, I will deep dive into properties.

ESXi Upgrade – Virtual Machines Drop From Network

Introduction

Was in the process of upgrading ESXi servers, 5.0 to 5.5 and noticed a number of virtual machines were disconnected from network randomly.

Throughout this blog post, I will be going through:

Symptom
Workaround

Symptom

The cluster had 8 nodes (n+1) and it was scheduled to upgrade one ESXi server per day. On the next day, after one ESXi server was upgraded to 5.5, a few incidents were raised saying some virtual machines aren’t accessible over network.

To summarise:

One ESXi server was upgraded to 5.5
Some virtual machines disconnected from network across multiple ESXi servers
ESXi servers were manageable via vCenter Server
Virtual machines were also manageable, e.g. reconfigure virtual machine

The first method I’ve chosen for this issue was to check the ESXi logs and found interesting lines in VMKernel.log:

2014-11-07T22:07:47.202Z cpu16:1804972)Net: 1652: connected test_vm1.eth1 eth1 to vDS, portID 0x200002e
2014-11-07T22:07:47.202Z cpu16:1804972)Net: 1985: associated dvPort 513 with portID 0x200002e
2014-11-07T22:07:47.202Z cpu16:1804972)etherswitch: L2Sec_EnforcePortCompliance:247: client test_vm1.eth1 requested mac address change to 00:00:00:00:00:00 on port 0x200002e, disallowed by vswitch policy2014-11-07T22:07:47.202Z cpu16:1804972)etherswitch: L2Sec_EnforcePortCompliance:356: client test_vm1.eth1 has policy vialations on port 0x200002e. Port is blocked
2014-11-07T22:07:47.202Z cpu16:1804972)etherswitch: L2Sec_EnforcePortCompliance:247: client test_vm1.eth1 requested mac address change to 00:00:00:00:00:00 on port 0x200002e, disallowed by vswitch policy
2014-11-07T22:07:47.202Z cpu16:1804972)etherswitch: L2Sec_EnforcePortCompliance:356: client test_vm1.eth1 has policy vialations on port 0x200002e. Port is blocked
2014-11-07T22:07:47.202Z cpu16:1804972)WARNING: NetPort: 1245: failed to enable port 0x200002e: Bad parameter
2014-11-07T22:07:47.202Z cpu16:1804972)NetPort: 1427: disabled port 0x200002e
2014-11-07T22:07:47.202Z cpu16:1804972)WARNING: Net: vm 1804972: 377: cannot enable port 0x200002e: Bad parameter
2014-11-07T23:50:39.821Z cpu4:4151)Net: 2191: dissociate dvPort 513 from port 0x200002e
2014-11-07T23:50:39.821Z cpu4:4151)Net: 2195: disconnected client from port 0x200002e

While Googling, could find a KB article related to this issue which can be found here. The problem was quite simple, ESXi 4.x or 5.0 supports only DVUplinks less than 31 characters and guess what, DVUplinks were longer than 31 characters!

Investigating the history of vMotions, the root cause of this problem was that DRS was load balancing workloads and during this process, some of virtual machines were spread out from ESXi 5.5 to ESXi 5.0 servers.

Workaround

The resolutions the above KB article suggests are to patch ESXi 5.0 to Patch 5. However, the patch needed reboot which means, it would take approximately same time as upgrading ESXi to 5.5. The better option was to upgrade ESXi 5.0 to 5.5 and not allow virtual machines running in ESXi 5.5 to move to ESXi 5.0 servers, i.e. set DRS to manual. The only problem with this was manual vMotion was required to evacuate virtual machines to place the ESXi server on maintenance.

Alternatively, it was possible to rename the DVUplinks to be less than 31 characters to benefit from fully automated DRS. But our standard was to leave the name of DVUplinks as default.

After a discussion, we came up with a workaround:

Set DRS to manual
Patching a ESXi server, set DRS to fully automated
Place the ESXi server on maintenance mode
Set DRS back to manual once maintenance mode task kicks in evacuation of virtual machines
Upgrade ESXi server
Repeat 2~5

In this way, I could evacuate virtual machines to other ESXi servers automatically and not worrying about virtual machines being vMotioned from 5.5 to 5.0.

Hope this helped and feel free to leave a comment.

vSphere Migration Scenario #2

Introduction

A few weeks ago, I was involved in decommissioning old ESXi servers due to out of warranty and for this work, I had to come up with a migration plan to evacuate virtual machines to a new cluster.

Throughout this blog post, I will be going through:

Requirements
Infrastructure
Tactic
Migration Plan

Requirements

There was only one requirement from the virtual machine owners, there should be no outage during the migration (1 or 2 packet loss is fine).

Infrastructure

The VMware infrastructure is setup as following:

Two clusters in a same vCenter server (Version 5.5) and each cluster has 4 ESXi servers (Version 5.5)
- Source_Cluster
- Destination_Cluster
Storage is FC based and clusters are zoned in different IO group (IBM SVC)
- Source_Cluster in IO group 1
- Destination_Cluster in IO group 0
Each cluster has it’s own dvSwitch and 2 x 10Gbe uplinks
- Source_dvSwitch (Version 5.0)
- Destination_dvSwitch (Version 5.5)
LAG is configured on the Source_Cluster and Destination_Cluster but no LACP

Tactic

There were two major areas to look at, dvSwitch and shared storage between the clusters.

dvSwitch

First attempt was, on a dedicated ESXi server, pulled one uplink out from the Source_dvSwitch and added it in to Destination_dvSwitch. After the migration, the management VMKernel port stayed in the Source_dvSwitch. A few minutes later, the ESXi server was disconnected from vCenter server and couldn’t ping it anymore. What happened?

I first logged into ESXi server via Shell and ran esxcli network ip neighbor list and found an interesting output:

LAG

The management VMKernel vmk0 could ping 172.27.3.252/253 but not 172.27.3.254, which is the gateway of the subnet. This was why the ESXi server was disconnected.

I’ve done some math below and I strongly recommend this blog to understand how Source & Destination IP Hash algorithm works.

Information:

Source ESXi server IP Address: 172.27.2.79
Destination IP Address #1: 172.27.3.253
Destination IP Address #2: 172.27.3.254

After converting them into Hex values:

Source ESXi server IP Address: 0xAC1B024F
Destination IP Address #1: 0xAC1B03FD
Destination IP Address #2: 0xAC1B03FE

Calculating XoR between source and destination IP addresses:

Source & Destination #1: 0x1B2
Source & Destination #2: 0x1B1

Finally, calculating MOD on the results above:

0x1B2 MOD 2 = 1
0x1B1 MOD 2 = 0

Do you see the problem here? The management VMKernel tries to connect 172.27.3.254 via first uplink but because this uplink has been removed and added to Destination_dvSwitch, the management VMKernel lost the connectivity to the gateway.

Since this was not a suitable solution, the decision was made to migrate ESXi server from Source_dvSwitch to Destination_dvSwitch completely.

Shared Storage

As ESXi servers in Source_Cluster and Destination_Cluster were zoned in different IO group, it wasn’t possible to share a VMFS volume between clusters for the migration.

There were two solutions to this:

Dedicate one ESXi server in Source_Cluster and zone it in both IO groups, i.e. 0 and 1 for migration purpose
Use vCenter 5.5 new feature, Change both host and datastore

To maximise the speed of the migration work, it was decided to go with the first option.

Final Migration Plan

The following was the final migration plan:

Dedicate one ESXi server in Source_Cluster
Zone in both IO groups
Create a VMFS volume for the migration purpose
vMotion virtual machines to the dedicated ESXi server in step 1
Storage vMotion virtual machines created in step 3
Migrate the dedicated ESXi server from Source_dvSwitch to Destination_dvSwitch.
vMotion virtual machines to Destination_Cluster
Migrate the dedicated ESXi server back to Source_dvSwitch
Repeat steps 1~8

Wrap-Up

One thing I would highlight is the migration plan above is just a guideline, every VMware infrastructure is different and you have to make it fit to yours.

Hope the real life migration scenario described above helps and if you want another example, it could be found here.

If you have a question or problem, always welcome to leave a message.