Knowledge Search


×
 

[SRX] Manual reth failover fails due to multicast traffic

  [KB33533] Show Article Properties


Summary:

This article describes a high-end SRX after manual failover of the data plan, both nodes continue to receive (expected) and transmit (not expected) multicast packets (but no unicast) at the same time. This causes the switch to disable MAC learning in the transit firewall VLAN because it looks like MAC flapping.

Symptoms:

Topology:

Multicast source------(xe-1/0/1 and xe-9/0/1)reth3:SRX-node0/node1:reth1(xe-1/0/0 and xe-9/0/0)------Multicast receiver

After failover RG1 from node0 to node1, both nodes output packets increase for a while (at least several minutes).

Monitor interface traffic:

Interface    Link  Input packets        (pps)     Output packets        (pps)
xe-1/0/0      Up    17387609175    (1331856)      30288527867    (1331852) <-- receiver side is sending traffic out on node0
xe-1/0/1      Up    21698592781    (1331850)      34359738368          (0)

xe-9/0/0      Up     8672701926          (0)      17387303398    (1330512 <-- receiver side is sending traffic out on node1
xe-9/0/1      Up     9660109332    (1330514)      17179869207          (0)

root@test123> show interfaces terse | match reth1  <-- This is receiver
xe-1/0/0.0              up    up   aenet    --> reth1.0
xe-9/0/0.0              up    up   aenet    --> reth1.0
reth1                   up    up
reth1.0                 up    up   inet     10.99.62.242/30

{primary:node0}
root@test123> show interfaces terse | match reth3 <-- This is pim source
xe-1/0/1.0              up    up   aenet    --> reth3.0
xe-9/0/1.0              up    up   aenet    --> reth3.0
reth3                   up    up
reth3.0                 up    up   inet     10.99.62.226/30

Both reth1 and reth3 are in route-instance vr-prod

{primary:node0}
root@test123> show igmp group
Interface: reth1.0, Groups: 1
    Group: 239.121.100.1
        Source: 0.0.0.0
        Last reported by: 10.99.62.241
        Timeout:     218 Type: Dynamic
Interface: local, Groups: 3
    Group: 224.0.0.2
        Source: 0.0.0.0
        Last reported by: Local
        Timeout:       0 Type: Dynamic
    Group: 224.0.0.13
        Source: 0.0.0.0
        Last reported by: Local
        Timeout:       0 Type: Dynamic
    Group: 224.0.0.22
        Source: 0.0.0.0
        Last reported by: Local
        Timeout:       0 Type: Dynamic
Cause:

This issue may occur under the following conditions:

  • SRX1400, SRX3k and SRX5k platforms with active/passive chassis cluster configuration
  • Services-offload is enabled
  • PIM-SM and IGMP are configured
  • Do failover RG1
Solution:

Refer to PR1323024 - The device might stop forwarding traffic after RG1 failover from node0 to node1 for a list of the fixed releases.

The fix is to delete SOF NP session when receiving copy packet on the backup node to avoid NP on both nodes sending traffic.

Related Links: