Knowledge Search


×
 

[SRX] Example - RG state transitions during autorecovery of fabric link on a chassis cluster

  [KB32508] Show Article Properties


Summary:

This article provides an example of HA RG (Redundancy Group) state transitions during autorecovery of the fabric link.

Symptoms:

If the fabric link goes down, RG1+ becomes ineligible on the secondary node (or the node) with failures, by default.

Starting with Junos OS 12.1X46-D20, an autorecovery feature of the fabric link was introduced. Once the fabric link is up, the status will return to normal without a reboot. However, the RGs due encounter the lost status on the secondary node (or the node with failures) during autorecovery.

Solution:

Because of  the fabric link autorecovery feature, there is no need to reboot the secondary node to restore the status; all the FPC cards of the secondary node (or the node with failures) will auto soft-restart.

Due the FPC auto soft-restart, all interfaces on secondary (or the node with failures) including the control link will temporarily go down. Both RG0 and RG1+ will change to a lost status.

 

Example:

Disconnect Fabric Link

lab@srx345-1> show chassis cluster status
Monitor Failure codes:
    CS  Cold Sync monitoring        FL  Fabric Connection monitoring
    GR  GRES monitoring             HW  Hardware monitoring
    IF  Interface monitoring        IP  IP monitoring
    LB  Loopback monitoring         MB  Mbuf monitoring
    NH  Nexthop monitoring          NP  NPC monitoring             
    SP  SPU monitoring              SM  Schedule monitoring
    CF  Config Sync monitoring
 
Cluster ID: 1
Node   Priority Status         Preempt Manual   Monitor-failures
 
Redundancy group: 0 , Failover count: 1
node0  100      primary        no      no       None          
node1  0        secondary      no      no       FL            
 
Redundancy group: 1 , Failover count: 1
node0  100      primary        no      no       None          
node1  0        ineligible     no      no       FL    ------ RG1 becomes ineligible status
 

Connect Fabric Link

{primary:node0}
lab@srx345-1> show chassis cluster status   
Monitor Failure codes:
    CS  Cold Sync monitoring        FL  Fabric Connection monitoring
    GR  GRES monitoring             HW  Hardware monitoring
    IF  Interface monitoring        IP  IP monitoring
    LB  Loopback monitoring         MB  Mbuf monitoring
    NH  Nexthop monitoring          NP  NPC monitoring             
    SP  SPU monitoring              SM  Schedule monitoring
    CF  Config Sync monitoring
 
Cluster ID: 1
Node   Priority Status         Preempt Manual   Monitor-failures
 
Redundancy group: 0 , Failover count: 1
node0  100      primary        no      no       None          
node1  0        lost           n/a     n/a      n/a    ------ both RGs become lost status

Redundancy group: 1 , Failover count: 1                                               
node0  0        primary        no      no       IF            
node1  0        lost           n/a     n/a      n/a    ----- both RGs become lost status
 

Autorecovery Finished

{primary:node0}
lab@srx345-1> show chassis cluster status   
Monitor Failure codes:
    CS  Cold Sync monitoring        FL  Fabric Connection monitoring
    GR  GRES monitoring             HW  Hardware monitoring
    IF  Interface monitoring        IP  IP monitoring
    LB  Loopback monitoring         MB  Mbuf monitoring
    NH  Nexthop monitoring          NP  NPC monitoring             
    SP  SPU monitoring              SM  Schedule monitoring
    CF  Config Sync monitoring
 
Cluster ID: 1
Node   Priority Status         Preempt Manual   Monitor-failures
 
Redundancy group: 0 , Failover count: 1
node0  100      primary        no      no       None          
node1  1        secondary      no      no       None          
 
Redundancy group: 1 , Failover count: 1
node0  100      primary        no      no       None          
node1  1        secondary      no      no       None  
 

Log message of chassisd on secondary node

Feb 26 17:50:16 LCC: fru_set_boolean: send: set_boolean_cmd FPC 0 setting coredump on
Feb 26 17:50:16 LCC: fru_set_boolean: send: set_boolean_cmd FPC 0 setting soft-restart on
Feb 26 17:50:16 LCC: fru_set_boolean: send: set_boolean_cmd FPC 0 setting pfeman-reconnect on
Feb 26 17:50:16 LCC: fpc_online_now - slot 0 - Online
Feb 26 17:50:16 CHASSISD_SNMP_TRAP3: ENTITY trap generated: entStateOperEnabled (entPhysicalIndex 10, entStateAdmin 4, entStateAlarm 6)
……
Feb 26 17:51:05 LCC: .. power sequencer started ..
Feb 26 17:51:05 LCC: ch_fru_power_sequencer FPC 0 step 0
Feb 26 17:51:05 LCC: FPC 0 power is on
Feb 26 17:51:05 LCC: ... power sequencer finished ...
Feb 26 17:51:05 CHASSISD_RECONNECT_SUCCESSFUL: Successfully reconnected on soft restart

 
Related Links: