This article provides an example of HA RG (Redundancy Group) state transitions during autorecovery of the fabric link.
If the fabric link goes down, RG1+ becomes ineligible on the secondary node (or the node) with failures, by default.
Starting with Junos OS 12.1X46-D20, an autorecovery feature of the fabric link was introduced. Once the fabric link is up, the status will return to normal without a reboot. However, the RGs due encounter the lost status on the secondary node (or the node with failures) during autorecovery.
Because of the fabric link autorecovery feature, there is no need to reboot the secondary node to restore the status; all the FPC cards of the secondary node (or the node with failures) will auto soft-restart.
Due the FPC auto soft-restart, all interfaces on secondary (or the node with failures) including the control link will temporarily go down. Both RG0 and RG1+ will change to a lost status.
Example:
Disconnect Fabric Link
lab@srx345-1> show chassis cluster status
Monitor Failure codes:
CS Cold Sync monitoring FL Fabric Connection monitoring
GR GRES monitoring HW Hardware monitoring
IF Interface monitoring IP IP monitoring
LB Loopback monitoring MB Mbuf monitoring
NH Nexthop monitoring NP NPC monitoring
SP SPU monitoring SM Schedule monitoring
CF Config Sync monitoring
Cluster ID: 1
Node Priority Status Preempt Manual Monitor-failures
Redundancy group: 0 , Failover count: 1
node0 100 primary no no None
node1 0 secondary no no FL
Redundancy group: 1 , Failover count: 1
node0 100 primary no no None
node1 0 ineligible no no FL ------ RG1 becomes ineligible status
Connect Fabric Link
{primary:node0}
lab@srx345-1> show chassis cluster status
Monitor Failure codes:
CS Cold Sync monitoring FL Fabric Connection monitoring
GR GRES monitoring HW Hardware monitoring
IF Interface monitoring IP IP monitoring
LB Loopback monitoring MB Mbuf monitoring
NH Nexthop monitoring NP NPC monitoring
SP SPU monitoring SM Schedule monitoring
CF Config Sync monitoring
Cluster ID: 1
Node Priority Status Preempt Manual Monitor-failures
Redundancy group: 0 , Failover count: 1
node0 100 primary no no None
node1 0 lost n/a n/a n/a ------ both RGs become lost status
Redundancy group: 1 , Failover count: 1
node0 0 primary no no IF
node1 0 lost n/a n/a n/a ----- both RGs become lost status
Autorecovery Finished
{primary:node0}
lab@srx345-1> show chassis cluster status
Monitor Failure codes:
CS Cold Sync monitoring FL Fabric Connection monitoring
GR GRES monitoring HW Hardware monitoring
IF Interface monitoring IP IP monitoring
LB Loopback monitoring MB Mbuf monitoring
NH Nexthop monitoring NP NPC monitoring
SP SPU monitoring SM Schedule monitoring
CF Config Sync monitoring
Cluster ID: 1
Node Priority Status Preempt Manual Monitor-failures
Redundancy group: 0 , Failover count: 1
node0 100 primary no no None
node1 1 secondary no no None
Redundancy group: 1 , Failover count: 1
node0 100 primary no no None
node1 1 secondary no no None
Log message of chassisd on secondary node
Feb 26 17:50:16 LCC: fru_set_boolean: send: set_boolean_cmd FPC 0 setting coredump on
Feb 26 17:50:16 LCC: fru_set_boolean: send: set_boolean_cmd FPC 0 setting soft-restart on
Feb 26 17:50:16 LCC: fru_set_boolean: send: set_boolean_cmd FPC 0 setting pfeman-reconnect on
Feb 26 17:50:16 LCC: fpc_online_now - slot 0 - Online
Feb 26 17:50:16 CHASSISD_SNMP_TRAP3: ENTITY trap generated: entStateOperEnabled (entPhysicalIndex 10, entStateAdmin 4, entStateAlarm 6)
……
Feb 26 17:51:05 LCC: .. power sequencer started ..
Feb 26 17:51:05 LCC: ch_fru_power_sequencer FPC 0 step 0
Feb 26 17:51:05 LCC: FPC 0 power is on
Feb 26 17:51:05 LCC: ... power sequencer finished ...
Feb 26 17:51:05 CHASSISD_RECONNECT_SUCCESSFUL: Successfully reconnected on soft restart