PowerCLI ESXi NTP Service Query

Last week, I was asked to write up a script to check the NTP settings across all ESXi servers. The script had to output:

Cluster
ESXi
IP Address (NTP server)
Policy
Running

It looked very straightforward but one thing wasn’t clear. During scripting, I found that the policies Get-VMHostService -VMHost <VMHost[]> | %{$_.Policy} returns were:

On
Off
Automatic

So that the output looked like:

Cluster     : cluster1
ESXi        : esxi1.test.com
NTP Server  : ntp.test.com
Service     : NTP Daemon                  
Policy      : on                          
Running     : True

For the policy, on, off and automatic means nothing to the end-users, needed more explanations. After digging in, I discovered the following:

On => Start and stop with host
Off => Start and stop manually
Automatic => Start automatically if any ports are open, and stop when all ports are closed

Hence, I used a simple switch command to translate the above:

switch ($policy) { 
  "off" { "Start and stop manually"} 
  "on" { "Start and stop with host" } 
  "automatic" { "Start automatically if any ports are open, and stop when all ports are closed" }
}

And the final script is:

Function NTP-Query
{
  [CmdletBinding()]
  param(
    [Parameter(Mandatory=$true)]
    [string]$cluster
  )

  foreach ($esxi in (Get-Cluster -Name $cluster | Get-VMHost | Sort Name)) {         
    $service = Get-VMHostService -VMHost $esxi | Where {$_.Key -eq "ntpd"}
    $policy = switch ($service.Policy) { 
                "off" { "Start and stop manually"} 
                "on" { "Start and stop with host" } 
                "automatic" { "Start automatically if any ports are open, and stop when all ports are closed" }
              }
    $ntp = Get-VMHostNtpServer -VMHost $esxi
    $esxi | Select-Object @{N="Cluster";E={$_.Parent}}, 
                          @{N="ESXi";E={$_.Name}}, 
                          @{N="NTP Server";E={$ntp}}, 
                          @{N="Service";E={$service.Label}}, 
                          @{N="Policy";E={$policy}},
                          @{N="Running";E={$service.Running}}
  }
}

Hope this helps to those of you writing a script to check NTP settings as well as the service policy.

webCommander, SSL Certificate Installation

webCommander by VMware Fling allows the users to execute PowerCLI scripts to run reports, commands…etc via GUI. This means that managers or higher level people can easily generate reports based on their need!

Last week, I’ve installed and migrated most of my scripts. Before releasing it to others, I decided to install a SSL certificate as it parses the password in plain text. webCommander runs on IIS 8 and it was easy enough to generate a .csr file and import the .crt file. I won’t go through the exact steps, refer to this link http://www.entrust.net/knowledge-base/technote.cfm?tn=8713 or there are many other sites going through how to.

After installing and configuring the site bindings, I could access the webCommander site using https. I went to run several scripts to make sure it works. During the testing, I found two issues on:

Submit button not loading the script
Any buttons on workflow not doing anything

While debugging the code, I found that two files webCmd.xsl and workflow.html were loading jQuery using http! Details are shown below.

webCmd.xsl

workflow.html

After modifying them, the issues were solved.

Hope this helps to people who are planning to install a SSL certificate on webCommander.

Post ESXi 5.5 Upgrade, Virtual Machines Dropping From Network Randomly

Last weekend, I upgraded 4 ESXi servers from 5.0 U2 to 5.5.

After the upgrade everything looked fine. There were no issues on virtual machines, new HA agents were installed on the ESXi server without any problems…etc

Few hours later, I was called out saying a list of virtual machines dropped from the network. Usually when this issue happened, all I had to do was to vMotion virtual machines to other ESXi servers or disconnect/reconnect the virtual NIC. It fixed the issue but few hours later, I was called out again with the same problem!

First thing I wanted to check was the OS level to see what’s happening in it and luckily, there was a Windows Server that I had access to. I listed the ARP table first and as expected, there were no entries at all. Also, checked the device manager and interesting thing was the network adapter was being uninstalled! This gave me an idea that it might be:

VMware Tools issue
VMXNET3 issue

But these weren’t enough to prove anything, needed something specific. Thus I started looking into the logs. While investigating on the vmkernal.log, I found some interesting lines:

014-02-04T04:30:24.535Z cpu19:33048)MirrorThrottled.etherswitch: MirrorToPorts:3386: session legacy_promiscuous: failed to output 260 pkts to dst 0x33554441 during mirroring: Out of slots
2014-02-04T04:30:24.535Z cpu21:36469)MirrorThrottled.etherswitch: MirrorToPorts:3386: session legacy_promiscuous: failed to output 223 pkts to dst 0x33554441 during mirroring: Out of slots
2014-02-04T04:30:24.589Z cpu21:36994)MirrorThrottled.etherswitch: MirrorToPorts:3386: session legacy_promiscuous: failed to output 258 pkts to dst 0x33554441 during mirroring: Out of slots
2014-02-04T04:30:24.590Z cpu17:36997)MirrorThrottled.etherswitch: MirrorToPorts:3386: session legacy_promiscuous: failed to output 1 pkts to dst 0x33554441 during mirroring: Out of slots
2014-02-04T04:30:24.590Z cpu22:36848)MirrorThrottled.etherswitch: MirrorToPorts:3386: session legacy_promiscuous: failed to output 251 pkts to dst 0x33554441 during mirroring: Out of slots

I could see that the port 0x33554441 on the dvSwitch was consuming all available slots. After checking, port 0x33554441 was for the vADM (VMware Application Discovery Manager) collector. Monitored the cluster for 5~6 hours after powering off all collectors and the cluster was stablised.

In summary, if you have deployed vADM collectors in the cluster and face the issue above, make sure you turn them off. It’s not a permanent solution but it will stop you from people complaining about network outage.

In few days, I will be getting an answer from the vADM team and update the post.

Update

The solution of this problem is to change the network adapter type from VMXNET2 to VMXNET3.