Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[Linux] Troubleshooting bond interfaces on RHEL/CentOS 7.5

0

0

Article ID: KB36182 KB Last Updated: 06 Oct 2020Version: 1.0
Summary:

Link redundancy plays an important role in production networks on Contrail compute, and bond interfaces, whether tagged or untagged, are used to ensure link redundancy.

The article describes troubleshooting bond interfaces on RHEL/CentOS 7.5.

Note: Refer to KB36183 - [Linux] Configuring bond interfaces on RHEL/CentOS 7.5 for configuring bond interfaces.

 

Solution:

To troubleshoot issues with bond interfaces on RHEL/CentOS 7.5, perform the following steps:

  1. Check the physical interfaces and bond interface status.

    1. Verify the status of the member interfaces and the bond interface with the following commands.

    2. If the physical interfaces are down, verify the physical cable, SFP, and switch port to which the server is connected.

root@localhost:~# ip link show
<clipped>
4: ens1f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP mode DEFAULT qlen 1000
   link/ether 90:e2:ba:cc:27:70 brd ff:ff:ff:ff:ff:ff
<clipped>
5: ens1f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP mode DEFAULT qlen 1000
   link/ether 90:e2:ba:cc:27:71 brd ff:ff:ff:ff:ff:ff
<clipped>
31: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT
   link/ether 90:e2:ba:cc:27:70 brd ff:ff:ff:ff:ff:ff
[root@localhost ~]#ip address show 
<clipped>
4: ens1f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
     link/ether 90:e2:ba:cc:27:70 brd ff:ff:ff:ff:ff:ff
<clipped>
8: ens1f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 90:e2:ba:cc:27:70 brd ff:ff:ff:ff:ff:ff
<clipped>
31: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 90:e2:ba:cc:27:70 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::92e2:baff:fecc:2770/64 scope link 
       valid_lft forever preferred_lft forever
  1. If the physical interfaces are up and the bond interface is showing down, perform the following:

    1. Verify the bonding status by using the command cat /proc/net/bonding/bond0.

    2. Check the LACP parameters from the actor (server self-configuration) device and confirm that they are correct as per the local configuration.

    3. Verify link failure counts and MII status and determine if any links are flapping.

    4. Capture packets on all member interfaces and verify all LACP parameters in the partner device configuration.

    5. Check dmesg logs and verify if there is any LACP parameter mismatch.

dmesg and bonding outputs when member links are reported to be down is shown below:

root@localhost:~# dmesg 
[186643.488850] ixgbe 0000:08:00.1 ens1f1: NIC Link is Down
[186643.489096] ixgbe 0000:08:00.1 ens1f1: speed changed to 0 for port ens1f1
[186643.499780] bond0: link status definitely down for interface ens1f1, disabling it
[186643.499790] bond0: first active interface up!
[186643.500145] ixgbe 0000:08:00.0 ens1f0: NIC Link is Down
[186643.599783] bond0: link status definitely down for interface ens1f0, disabling it
[186643.599793] bond0: now running without any active interface!
[186644.492143] fabric: port 1(bond0.2004) entered disabled state
[root@localhost ~]# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: down
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 90:e2:ba:cc:27:70
bond bond0 has no active aggregator

Slave Interface: ens1f0
MII Status: down <<<<
Speed: Unknown
Duplex: Unknown
Link Failure Count: 5  <<<<
Permanent HW addr: 90:e2:ba:cc:27:70
Slave queue ID: 0
Aggregator ID: 8
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: 90:e2:ba:cc:27:70
    port key: 0
    port priority: 255
    port number: 1
    port state: 63
details partner lacp pdu:
    system priority: 127
    system mac address: 4c:16:fc:52:e4:00
    oper key: 10
    port priority: 127
    port number: 3
    port state: 63

Slave Interface: ens1f1
MII Status: down  <<<<
Speed: Unknown
Duplex: Unknown
Link Failure Count: 2  <<<<
Permanent HW addr: 90:e2:ba:cc:27:71
Slave queue ID: 0
Aggregator ID: 8
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: 90:e2:ba:cc:27:70
    port key: 0
    port priority: 255
    port number: 2
    port state: 63
details partner lacp pdu:
    system priority: 127
    system mac address: 4c:16:fc:52:e4:00
    oper key: 10
    port priority: 127
    port number: 2
    port state: 63
[root@localhost ~]# 
  1. LACP parameters can also be verified from tcpdump on member interfaces.

    1. Perform tcpdump on member interfaces and see the LACP PDUs, which can be used to verify adjacent device LACP parameters. 

[root@localhost ~]# tcpdump -i  ens1f0 -vv -s 1500
tcpdump: listening on ens1f0, link-type EN10MB (Ethernet), capture size 1500 bytes
13:34:19.618263 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.10.254 tell 192.168.10.75, length 46
13:34:19.728893 LACPv1, length 110
        Actor Information TLV (0x01), length 20
          System 4c:16:fc:52:e4:00 (oui Unknown), System Priority 127, Key 10, Port 3, Port Priority 127
          State Flags [Activity, Timeout, Aggregation, Synchronization, Collecting, Distributing]
          0x0000:  007f 4c16 fc52 e400 000a 007f 0003 3f00
          0x0010:  0000
        Partner Information TLV (0x02), length 20
          System 90:e2:ba:cc:27:70 (oui Unknown), System Priority 65535, Key 15, Port 1, Port Priority 255
          State Flags [Activity, Timeout, Aggregation, Synchronization, Collecting, Distributing]
          0x0000:  ffff 90e2 bacc 2770 000f 00ff 0001 3f00
          0x0010:  0000
        Collector Information TLV (0x03), length 16
          Max Delay 0
          0x0000:  0000 0000 0000 0000 0000 0000 0000
        Terminator TLV (0x00), length 0
13:34:19.826359 LACPv1, length 110
        Actor Information TLV (0x01), length 20
          System 90:e2:ba:cc:27:70 (oui Unknown), System Priority 65535, Key 15, Port 1, Port Priority 255
          State Flags [Activity, Timeout, Aggregation, Synchronization, Collecting, Distributing]
          0x0000:  ffff 90e2 bacc 2770 000f 00ff 0001 3f00
          0x0010:  0000
        Partner Information TLV (0x02), length 20
          System 4c:16:fc:52:e4:00 (oui Unknown), System Priority 127, Key 10, Port 3, Port Priority 127
          State Flags [Activity, Timeout, Aggregation, Synchronization, Collecting, Distributing]
          0x0000:  007f 4c16 fc52 e400 000a 007f 0003 3f00
          0x0010:  0000
        Collector Information TLV (0x03), length 16
          Max Delay 0
          0x0000:  0000 0000 0000 0000 0000 0000 0000
        Terminator TLV (0x00), length 0
  1. Write the tcpdump to PCAP by using the following command. The PCAP can then be analyzed at a later stage.

tcpdump -n -i ens1f0 -w /tmp/$(hostname)-$(date +"%Y-%m-%d-%H-%M-%S")_slave1.pcap
tcpdump -n -i ens1f1 -w /tmp/$(hostname)-$(date +"%Y-%m-%d-%H-%M-%S")_slave2.pcap
  1. If one member link is observed to be down, the bond interface should be up and operational.

    1. The following are examples of dmesg logs and command outputs, which will be observed when one of the member ports is down.

root@localhost:~# dmesg 
[186414.155971] ixgbe 0000:08:00.0 ens1f0: NIC Link is Down
[186414.156209] ixgbe 0000:08:00.0 ens1f0: speed changed to 0 for port ens1f0
[186414.179806] bond0: link status definitely down for interface ens1f0, disabling it


[root@localhost ~]# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 90:e2:ba:cc:27:70
Active Aggregator Info:
        Aggregator ID: 8
        Number of ports: 1
        Actor Key: 15
        Partner Key: 10
        Partner Mac Address: 4c:16:fc:52:e4:00

Slave Interface: ens1f0
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 3
Permanent HW addr: 90:e2:ba:cc:27:70
Slave queue ID: 0
Aggregator ID: 8
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: 90:e2:ba:cc:27:70
    port key: 0
    port priority: 255
    port number: 1
    port state: 63
details partner lacp pdu:
    system priority: 127
    system mac address: 4c:16:fc:52:e4:00
    oper key: 10
    port priority: 127
    port number: 3
    port state: 63

Slave Interface: ens1f1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 90:e2:ba:cc:27:71
Slave queue ID: 0
Aggregator ID: 8
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: 90:e2:ba:cc:27:70
    port key: 15
    port priority: 255
    port number: 2
    port state: 63
details partner lacp pdu:
    system priority: 127
    system mac address: 4c:16:fc:52:e4:00
    oper key: 10
    port priority: 127
    port number: 2
    port state: 63
[root@localhost ~]# 
  1. Verify the bond interface status and dmesg logs when all members are restored and up. Example outputs of a working bond interface are given below. 

    1. Member interfaces and the bond interface are up. dmesg prints logs as shown below.

root@localhost:~# dmesg 
[186806.146200] ixgbe 0000:08:00.1 ens1f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[186806.213235] bond0: link status definitely up for interface ens1f1, 10000 Mbps full duplex
[186806.213248] bond0: first active interface up!
[186806.213321] fabric: port 1(bond0.2004) entered blocking state
[186806.213325] fabric: port 1(bond0.2004) entered listening state
[186806.288215] ixgbe 0000:08:00.0 ens1f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[186806.313231] bond0: link status definitely up for interface ens1f0, 10000 Mbps full duplex
[186821.246393] fabric: port 1(bond0.2004) entered learning state
[186836.287625] fabric: port 1(bond0.2004) entered forwarding state
[186836.287641] fabric: topology change detected, propagating
[root@localhost ~]# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 90:e2:ba:cc:27:70
Active Aggregator Info:
        Aggregator ID: 8
        Number of ports: 2
        Actor Key: 15
        Partner Key: 10
        Partner Mac Address: 4c:16:fc:52:e4:00

Slave Interface: ens1f0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 5
Permanent HW addr: 90:e2:ba:cc:27:70
Slave queue ID: 0
Aggregator ID: 8
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: 90:e2:ba:cc:27:70
    port key: 15
    port priority: 255
    port number: 1
    port state: 63
details partner lacp pdu:
    system priority: 127
    system mac address: 4c:16:fc:52:e4:00
    oper key: 10
    port priority: 127
    port number: 3
    port state: 63

Slave Interface: ens1f1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 2
Permanent HW addr: 90:e2:ba:cc:27:71
Slave queue ID: 0
Aggregator ID: 8
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: 90:e2:ba:cc:27:70
    port key: 15
    port priority: 255
    port number: 2
    port state: 63
details partner lacp pdu:
    system priority: 127
    system mac address: 4c:16:fc:52:e4:00
    oper key: 10
    port priority: 127
    port number: 2
    port state: 63
[root@localhost ~]# 

 

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search