This article describes the cluster stability issues that occur, when the ports in a WLC port group fail.
When a WLC device is a part of a Cluster configuration, it will exchange Keep-alive (KA) packets with the Primary and Secondary SEEDs in the Mobility-Domain; so that the network status of all devices is known. By default, these KA packets are exchanged every 30ms and if there are no acknowledgements after 5 attempts, the devices may be marked as Down in the cluster (this entails a total time of 5x30ms=150ms).
When a WLC device is uplinked to the network via a Port Group, the traffic will exit the ports of the device in a deterministic fashion. When a port in the port group is sensed to be down, the port group will reconfigure the traffic flow to no longer use the down port. However, in this situation, an issue may arise when the amount of time taken for a port to be sensed as down is longer than the default cluster keep-alive time (150ms).
To resolve this issue, you have to reconfigure the Cluster Keep-alive interval to be larger than the default value of 30ms. A value of 300ms has been found to interoperate well with both Juniper WLCs and various other network vendors. This allows a port group to reconfigure, after the loss of a port (total cluster time of 5x300ms=1500ms), without the Cluster Keep Alive time expiring.
You can set the
Cluster Keep-alive interval to
300ms from the CLI of the Primary SEED, via the following command:
set cluster keepalive interval 300
You can also configure the
Cluster Keep-alive interval in RingMaster; go to
Configuration Page> Cluster Level > Keepalive Configuration: