Knowledge Search


×
 

[QFabric] Multicast traffic black-holing due to PFE mis-programming on Junos OS 14.1X53-D122

  [KB34645] Show Article Properties


Summary:

On QFabric solutions that run Junos OS release 14.1X53-D122‚Äč, users may find that the L3MCAST_V4 route update with core-key 0 is ignored, leading to (source, group) entries missing in the Packet Forwarding Engine (PFE) and consequent complete multicast traffic drop.

This article explains why this happens and gives a couple of workarounds, in addition to listing the Junos OS release in which the problem has been resolved.

Symptoms:

For any active flow, if core-key: 0 is assigned, it is observed that the IPMC table entries are not programmed correctly on the nodes that are connected to the receivers/source, which impacts multicast traffic forwarding and results in complete traffic drop.

Cause:

Below are two scenarios and observations:

  • When the first receiver joins in and is allocated core-key:0: From the new receiver node's perspective, let's say RSNG7, it just gets this flow (S1,G1) with core-key 0 and so it will not process and program the IPMC. Hence, no IPMC entry exists.

  • When a new receiver joins in and the MoD (Module ID) combination changes: From the existing RSNG's perspective (RSNG1, RSNG2, and RSNG3), it already has an IPMC entry based on core-key 75 (or the core key assigned to the MoD combination at the given time). But NNG has changed core-key 75 to 0 now because the new RSNG7 has been added to the list. Here, it is seen that RSNG7 is not processing the flow with core-key 0; rather it continues to use the existing IPMC entry. Although this IPMC entry is enough to forward traffic, there has been a change in hashing because of the new combination (RSNG1, RSNG2, RSNG3, and RSNG7) and only an accidental result of hashing with the same set of HG ports as earlier will help in traffic forwarding.

When a new multicast flow (sender or receiver) becomes active, it requires core-key assignment on NNG. However, in case NNG already has 126 unique combinations utilized, it assigns core-key 0 to the new flow. In the problem case explained above, RSNG7 ignores this update and this flow ceases to work.

Issue Verification 

The below command identifies whether core-key “0” is assigned and referred to by any of the (S,G)s on the system.

  1. Check for the S,G getting the core-key: 0 assignment by using the following command. This command has to be run on all affected node-groups (RSNG/NNG):
RSNG/NNG> show fabric multicast edge key-vlan-group
    Core key  L2 domain  Group
    0         3          3.223.255.255.1  
    0         16842765   239.202.100.33, 19.19.19.10  <active flow getting core key zero>
    0         16842765   239.202.100.33, 21.21.21.10  <active flow getting core key zero>
    123       16842765   225.55.55.55, 11.11.11.10
    123       16842765   239.201.1.25, 11.11.11.10
    123       16842765   239.201.1.25, 14.14.14.10
    125       16842765   225.0.0.55, 15.15.15.10
    125       16842765   225.0.0.55, 18.18.18.10
    4098      2          2.255.255.255.255
    4107      11         11.224.0.0.0     
    4107      11         11.255.255.255.255
    4108      12         12.224.0.0.0     
    4108      12         12.255.255.255.255 
  1. To proactively check whether all 126 core keys are used, run the following command on NNG to see if all 126 core keys are listed.

    Note: The first column (Key) represents the core key number and the second column (Ref) indicates the number of active flows (source, group) that are using the core key. If the Ref value is 0, it means that the core key is free and no active flow is using that core key.

fabric-admin@NW-NG-0> show fabric multicast root core-key-to-map

Key    Ref count  Member map                L2 Domain  Group                  Source
1     5          300000084085A2:6/12n/a       n/a              n/a
2      3          84085A2:30/10      n/a       n/a              n/a
3      1          100000000582:430/8 n/a       n/a              n/a
4      3          7F7C08000D4005C6:880001/26n/an/a              n/a
5      1          300000000085A2/8   n/a       n/a              n/a
<<Cut for brevity>>
125     1          30C04CA28D05C6:1F8/26n/a     n/a              n/a
126     1          102000000582:400/7 n/a       n/a              n/a
Ref count: The number of flows using that core key.
  1. Check the PFE map assigned to these flows:

root@NW-NG-0> show fabric multicast root layer3-group-membership-entries | find 239.202  
May 29 06:14:08    Group:Source:                  239.202.100.33, 19.19.19.10
    Multicast key:                 0
    Packet Forwarding map:         DE000/6
    New Packet Forwarding map:     0/0
    Mrouter Packet Forwarding map: 0/0
    Interim multicast key:         0
    Interim Packet Forwarding map: 0/0
    Flags:                         0x8
    Routing table:                 T026-TEST---qfabric.inet.1(0x101000D)
    L2 domain:                     0
    Group:Source:                  239.202.100.33, 21.21.21.10
    Multicast key:                 0
    Packet Forwarding map:         DE000/6
    New Packet Forwarding map:     0/0
    Mrouter Packet Forwarding map: 0/0
    Interim multicast key:         0
    Interim Packet Forwarding map: 0/0
    Flags:                         0x8

Note: The PFE map states the combination of LCs/nodes connected to the receivers. The PFE map in the above case is 0xDE000.

0xDE000 in binary is 1101 1110 0000 0000 0000
Mapped to pfe 13 14 15 16 18 and 19

                Position =   19 18 16   15 14 13 
                   DE000 =   1101  1110 0000 0000 0000

Note: Positions start from 0, 1, 2, and so on from right to left.

239.202.100.33, 19.19.19.10

The above flow is active on six LCs and getting core key: 0 because 0xDE000 is mapped to them.

qf1-nng-1       =>13
qf1-rsng5-1     =>14
qf1-rsng6-1     =>15
qf1-rsng4-1     =>16
qf1-rsng2-2     =>18
qf1-nng-2       =>19

To check the MoD ID:

TFXPC0(vty)# set dcbcm bcmshell "stkm"
HW (unit 0)
STKMode: unit 0: module id 13

TFXPC0(vty)#
  1. To verify core-key assignment with another command, use the following:

qfabric-admin@NW-NG-0> show route fabric table default.fabric.0 extensive fabric-route-type layer3-mcast-routes  | find 239.202.100.33 

Path 177:19.19.19.10:239.202.100.33:176(L3MCAST_V4) Vector len 4.  Val: 0
        *Fabric Preference: 40
                Next hop type: Discard
                Address: 0x9420c24
                Next-hop reference count: 1204
                State: <Active Int AlwaysFlash>
                Age: 17:43
                Validation State: unverified
                Task: DCF
                Announcement bits (3): 1-BGP Route Target 2-BGP_RT_Background 3-Resolve tree 3
                AS path: I
                Communities: target:65534:100663472(L2:176)
                SNPA count: 1, SNPA length: 0
                SNPA Type: PFE Port SNPA
                PFE ID: 27, Port ID: 27
                Core-key: 0
  1. To verify the PFE map to core-key assignment for 0xDE000, use the following:

qfabric-admin@NW-NG-0> show fabric multicast root map-to-core-key

    0      0           609       8      */256                         
    13     1           318       A      182000/3                       
    74     1           18        A      182800/4                      
    54     1           21        A      193000/5                      
    114    1           2         8      1A7000/6                      
    107    1           2         8      1A7800/7                      
    16     1           64        A      1B2000/5                      
    15     250         22        A      1B2800/6                      
<truncated>
   93     1           2         10     1E6800/7                      
    111    1           2         10     97000/5                       
    61     1           2         10     9A000/4                       
    99     1           2         10     B6000/5                       
    92     1           2         10     B7000/6                       
    123    1           2         10     D6000/5                       
    118    1           2         10     D6800/6                       
    100    1           2         10     D7800/7                       
    0      1           2         10     DA000/5      >>>>>>>>>>>>>>>>       
    0      250         6         10     DE000/6      >>>>>>>>>>                 
    93     1           2         10     E6800/6                       
    94     1           2         10     F6000/6                       
    Key    Extra flood Ref count Flag   Member map
root@NW-NG-0:RE:9% 

Note: The PFE map DE000/6 is assigned with core-key: 0, which means that any active flow (Source, Group) associated with this PFE-map will be affected. In the above case, (239.202.100.33, 19.19.19.10) is associated with DE000/6 and therefore, the flow can face black-holing of traffic due to PFE mis-programming.

Solution:

The problem of RSNG not processing the core-key “0” update has been resolved in Junos OS 14.1X53-D140 (PR1437536).

Meanwhile, a couple of workarounds that you can use are as follows:

  • Disable and re-enable IGMP-snooping membership on the impacted RSNG's VLANs (it may or may not work).

  • Restart RPDF/Fabric on NWNG if old core-key map entries are not freed up.

Related Links: