Introduction
Link Aggregation Control Protocol (LACP) allows a network device to negotiate an automatic bundling of links by sending LACP packets to the peer. End devices with LACP enabled (in this case, it’s ESXi and physical switch) send/receive frames called LACPDUs each other. Based on which LACP timer it is using, the period of LACPDUs’ differ. 1 second for fast and 30 seconds for slow. Why do we need LACP? The ultimate goal of LACP is to detect any LAG mis-configuration and wiring errors. For more information about LAG & LACP, refer here.
It’s been a long time to use LAG for ESXi servers but no LACP as it wasn’t supported. LACP was finally introduced in vSphere 5.1 but due to the SSO complexity and bugs, vSphere 5.1 was excluded from my list. With the introduction of stable vSphere version 5.5, as well as the enhanced LACP features, it’s been decided to upgrade and start configuring LACP. For more information about enhanced LACP features, refer to this link.
Enhanced LACP is only supported on dvSwitch 5.5 which means that previous version of dvSwitch needs to be upgraded. This requires vCenter server & ESXi to be upgraded as well. For more information about upgrading & configuring LACP, there is a step-by-step guide written by Chris Wahl which can be found here and I strongly recommend this post.
In this blog post, it will be going through advanced LACP configuration using esxcli command. Most will be based on SSH shell, be prepared!
Advanced Configuration
Assuming LACP is configured, let’s confirm it is operating. First of all, log-in to the ESXi server where LACP is configured using root credentials and run esxcli network vswitch dvs vmware lacp
~ # esxcli network vswitch dvs vmware lacp Usage: esxcli network vswitch dvs vmware lacp {cmd} [cmd options] Available Namespaces: config Command to get LACP configuration stats Command to get LACP protocol statistics status Command to get LACP port status timeout Command to set LACP LAG timeout
- config get
- stats get
- status get
- timeout set
esxcli network vswitch dvs vmware lacp config get
Config get namespace shows the current overall LACP configuration:
- Name of the dvSwitch
- Name of the LAG
- ID of the LAG
- NICs which are in LAG
- Status, i.e. enabled or disabled
- Mode, i.e. active or passive
- Load balancing algorithm, there are 20 algorithms available
An example output is shown below.
~ # esxcli network vswitch dvs vmware lacp config get
DVS Name LAG Name LAG ID NICs Enabled Mode Load balance ---------------- --------------------- ------ ------------- ------- ------ ------------ dvSwitch_TEST default_uplink_pg_lag 0 vmnic0,vmnic1 true Active ---
esxcli network vswitch dvs vmware lacp stats get
Stats get namespace shows you the real time data of LACPDUs’ being sent/received. This could be used to ensure that LACPDUs are being sent/received to/from the physical switch. An example is shown below.
~ # esxcli network vswitch dvs vmware lacp stats get DVSwitch LAGID NIC Rx Errors Rx LACPDUs Tx Errors Tx LACPDUs ---------------- ----- ------ --------- ---------- --------- ---------- dvSwitch_TEST 0 vmnic0 0 10 0 116 dvSwitch_TEST 0 vmnic1 0 10 0 116
This namespace will be updated based on the LACP timer, i.e. slow or fast. This will be discussed later on in this post.
esxcli network vswitch dvs vmware lacp status get
Status get represents detailed LACP configuration and the most important information is:
- LACP timer, fast or slow
- LACP mode, active or passive
An example output is below and take a close look at italic & bold lines.
/var/log # esxcli network vswitch dvs vmware lacp status get
dvSwitch_TEST DVSwitch: dvSwitch_TEST Flags: S - Device is sending Slow LACPDUs, F - Device is sending fast LACPDUs, A - Device is in active mode, P - Device is in passive mode LAGID: 0 Mode: Active Nic List: Local Information: Admin Key: 11 Flags: SA Oper Key: 11 Port Number: 32769 Port Priority: 255 Port State: ACT,AGG,SYN,COL,DIST, Nic: vmnic1 Partner Information: Age: 00:00:05 Device ID: aa:bb:cc:dd:ee:ff Flags: SA Oper Key: 6 Port Number: 11 Port Priority: 127 Port State: ACT,AGG,SYN,COL,DIST, State: Bundled Local Information: Admin Key: 11 Flags: SA Oper Key: 11 Port Number: 32768 Port Priority: 255 Port State: ACT,AGG,SYN,COL,DIST, Nic: vmnic0 Partner Information: Age: 00:00:05 Device ID: aa:bb:cc:dd:ee:ff Flags: SA Oper Key: 6 Port Number: 20 Port Priority: 127 Port State: ACT,AGG,SYN,COL,DIST, State: Bundled
On the example above, the flag is set to SA which means Slow and Active. Active/Passive mode could be changed via GUI but not the LACP timer. It could be done via esxcli and will be explained shortly.
esxcli network vswitch dvs vmware lacp timeout set
Timeout set namespace allows you to change the LACP timer either to slow or fast. This is a very important element to consider as it has to be matched on both sides, i.e. ESXi and physical switch.
Before going through timeout set namespace, let’s take a look at physical switch configuration. The example below is from Juniper QFabric:
show configuration interfaces ABCD description ABCD; mtu 9216; aggregated-ether-options { minimum-links 1; link-speed 10g; lacp { active; periodic fast; } } unit 0 { family ethernet-switching { port-mode trunk; vlan { members [ 10 11 12 13 14 15 ]; } } }
The above example output’s LACP is set to fast which means, LACP on dvSwitch needs to be configured as fast as well. Running esxcli network vswitch dvs vmware lacp status get will show you the LACP timer as described above. By default, it is set to Slow and it needs to modified to fast.
~ # esxcli network vswitch dvs vmware lacp timeout set Error: Missing required parameter -l|--lag-id Missing required parameter -s|--vds Missing required parameter -t|--timeout Usage: esxcli network vswitch dvs vmware lacp timeout set [cmd options] Description: set Set long/short timeout for vmnics in one LACP LAG Cmd options: -l|--lag-id=<long> The ID of LAG to be configured. (required) -n|--nic-name=<str> The nic name. If it is set, then only this vmnic in the lag will be configured. -t|--timeout Set long or short timeout: 1 for short timeout and 0 for long timeout. (required) -s|--vds=<str> The name of VDS. (required)
Using the status get namespace, the mandatory parameters could be obtained. In this case:
- –lag-id=0
- –vds=dvSwitch_TEST
- –timeout=1
Executing esxcli network vswitch dvs vmware lacp timeout set –lag-id=0 –vds=dvSwitch_TEST –timeout=1, there will be no output if it’s successful. Checking the status again, you will see the timer is set to fast:
/var/log # esxcli network vswitch dvs vmware lacp status get
dvSwitch_TEST DVSwitch: dvSwitch_TEST Flags: S - Device is sending Slow LACPDUs, F - Device is sending fast LACPDUs, A - Device is in active mode, P - Device is in passive mode LAGID: 0 Mode: Active Nic List: Local Information: Admin Key: 11 Flags: FA Oper Key: 11 Port Number: 32769 Port Priority: 255 Port State: ACT,AGG,SYN,COL,DIST, Nic: vmnic1 Partner Information: Age: 00:00:05 Device ID: aa:bb:cc:dd:ee:ff Flags: FA Oper Key: 6 Port Number: 11 Port Priority: 127 Port State: ACT,AGG,SYN,COL,DIST, State: Bundled Local Information: Admin Key: 11 Flags: FA Oper Key: 11 Port Number: 32768 Port Priority: 255 Port State: ACT,AGG,SYN,COL,DIST, Nic: vmnic0 Partner Information: Age: 00:00:05 Device ID: aa:bb:cc:dd:ee:ff Flags: FA Oper Key: 6 Port Number: 20 Port Priority: 127 Port State: ACT,AGG,SYN,COL,DIST, State: Bundled
Now let’s check if it sends/receives LACPDUs. The example below shows that LACPDUs received/transmitted successfully over 1 second.
~ # esxcli network vswitch dvs vmware lacp stats get DVSwitch LAGID NIC Rx Errors Rx LACPDUs Tx Errors Tx LACPDUs dvSwitch_TEST 0 vmnic1 0 72509 0 72355 dvSwitch_TEST 0 vmnic0 0 120912 0 369634
~ # esxcli network vswitch dvs vmware lacp stats get DVSwitch LAGID NIC Rx Errors Rx LACPDUs Tx Errors Tx LACPDUs dvSwitch_TEST 0 vmnic1 0 72510 0 72356 dvSwitch_TEST 0 vmnic0 0 120913 0 369635
Wrap-up
In this post, using esxcli command I went through how to:
- Check the status of LACP
- Check the detailed information
- Check the stats
- Configure LACP timer
I mainly focused on matching LACP timer as I had an issue on this. By default, VMware uses slow and Juniper QFabric uses fast that network flapping occurred.
Hope this helps 😀
thank you , l’im glad i found your article because there is really not much information about this elsewhere !
do you know where i can find more information about the Load balancing algorithms ?
i really don’t know what choice to make and why 🙂
Very Helpfull for testing LACP in ESXi. Thanks Alot !!