Knowledge Search


×
 

[SRX] Secondary-hold time is different between each failover rg0 in SRX

  [KB27450] Show Article Properties


Summary:

This article explains why secondary hold time seems different between each failover when performing a failover in rg0.

 

Symptoms:

Customer finds when they do failover in rg0, the secondary-hold time seems different between each failover.

In customer scenario, the timer is about 10s or 1 minute.

root@SRX5800-A> show chassis cluster information
node0:
--------------------------------------------------------------------------
Redundancy mode:
Configured mode: active-active
Operational mode: active-active

Redundancy group: 0, Threshold: 255, Monitoring failures: none
Events:
May 6 14:45:43.076 : secondary->primary, reason: Remote node is in secondary hol
May 6 14:52:10.982 : primary->secondary-hold, reason: Manual failover <--------
May 6 14:52:20.985 : secondary-hold->secondary, reason: Back to back failover interval <--------- about 10s
May 6 15:10:51.709 : secondary->primary, reason: Remote node is in secondary hol
May 6 15:18:45.922 : primary->secondary-hold, reason: Manual failover <--------
May 6 15:19:55.967 : secondary-hold->secondary, reason: Back to back failover interval <-------- more than 1 minute
May 6 15:44:20.042 : secondary->primary, reason: Remote node is in secondary hol
May 6 15:51:41.248 : primary->secondary-hold, reason: Manual failover
May 6 15:51:51.250 : secondary-hold->secondary, reason: Back to back failover interval
May 7 10:25:35.583 : secondary->primary, reason: Remote node is in secondary hol

Redundancy group: 1, Threshold: 255, Monitoring failures: none
Events:
May 6 14:30:38.976 : secondary->primary, reason: Remote node is in secondary hol
May 6 14:31:23.432 : primary->secondary-hold, reason: Manual failover 
May 6 14:31:24.432 : secondary-hold->secondary, reason: Back to back failover interval      
May 6 14:40:47.752 : secondary->primary, reason: Remote node is in secondary hol
May 6 14:51:58.968 : primary->secondary-hold, reason: Manual failover
May 6 14:51:59.969 : secondary-hold->secondary, reason: Back to back failover interval
May 6 15:10:16.673 : secondary->primary, reason: Remote node is in secondary hol
May 6 15:42:50.679 : primary->secondary-hold, reason: Manual failover
May 6 15:42:51.679 : secondary-hold->secondary, reason: Back to back failover interval
May 6 15:44:06.027 : secondary->primary, reason: Remote node is in secondary hol

node1:
--------------------------------------------------------------------------
Redundancy mode:
Configured mode: active-active
Operational mode: active-active

Redundancy group: 0, Threshold: 255, Monitoring failures: none
Events:
May 6 14:46:38.256 : secondary-hold->secondary, reason: Back to back failover interval
May 6 14:51:56.139 : secondary->primary, reason: Remote node is in secondary hol
May 6 15:10:36.793 : primary->secondary-hold, reason: Manual failover
May 6 15:10:46.822 : secondary-hold->secondary, reason: Back to back failover interval
May 6 15:18:30.978 : secondary->primary, reason: Remote node is in secondary hol
May 6 15:44:04.999 : primary->secondary-hold, reason: Manual failover
May 6 15:45:15.002 : secondary-hold->secondary, reason: Back to back failover interval
May 6 15:51:26.179 : secondary->primary, reason: Remote node is in secondary hol
May 7 10:25:16.287 : primary->secondary-hold, reason: Manual failover
May 7 10:26:26.290 : secondary-hold->secondary, reason: Back to back failover interval

Redundancy group: 1, Threshold: 255, Monitoring failures: none
Events:
May 6 14:30:25.213 : secondary-hold->secondary, reason: Back to back failover interval
May 6 14:31:08.667 : secondary->primary, reason: Remote node is in secondary hol
May 6 14:40:32.949 : primary->secondary-hold, reason: Manual failover
May 6 14:40:33.951 : secondary-hold->secondary, reason: Back to back failover interval
May 6 14:51:44.125 : secondary->primary, reason: Remote node is in secondary hol
May 6 15:10:01.759 : primary->secondary-hold, reason: Manual failover
May 6 15:10:02.760 : secondary-hold->secondary, reason: Back to back failover interval
May 6 15:42:35.643 : secondary->primary, reason: Remote node is in secondary hol
May 6 15:43:50.985 : primary->secondary-hold, reason: Manual failover
May 6 15:43:51.986 : secondary-hold->secondary, reason: Back to back failover interval

Here is the configuration of the cluster:

root@SRX5800-A> show configuration chassis           
cluster {
    reth-count 5;
    control-ports {
        fpc 8 port 0;
        fpc 20 port 0;
    }
    redundancy-group 0 {
        node 0 priority 200;
        node 1 priority 100;
        hold-down-interval 10;
    }
    redundancy-group 1 {
        node 0 priority 200;
        node 1 priority 100;
        interface-monitor {
            ge-3/1/0 weight 128;
            ge-15/1/0 weight 128;
            ge-15/1/1 weight 128;
            ge-3/1/1 weight 128;
        }
    }
}

 

Cause:

The hold timer is “10” seconds in the configuration, which is the timer when rg0 moves state from "secondary-hold" to "secondary." The default is 300 sec. After the secondary-hold timer expires, JSRPD will check the GRES kernel state. If the GRES kernel state is NOT ready, JSRPD will extend the timer with an additional 60 sec, and repeat this process again.

 

Solution:

It is normal if the time between each failover in rg0 is different.

 

Modification History:

2019-07-12: Article reviewed for accuracy; article found to be relevant and accurate

 

Related Links: