Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[J/SRX] How do interface and IP monitoring in a chassis cluster affect threshold and cause failover?

0

0

Article ID: KB29510 KB Last Updated: 06 Oct 2014Version: 1.0
Summary:

This article clarifies how IP and interface monitoring values such as priority, threshold, and weights work, and describes how these values affect failover in a Chassis Cluster for SRX.

Symptoms:

Cause:

Solution:

The three example cases below clarify how IP and interface monitoring values such as priority, threshold, and weights work, and describe how these values affect failover in a Chassis Cluster for SRX.

Case 1: global-weight is less than threshold value, and no interface-monitor failure.

Configuration:

{primary:node0}[edit]
root@primarynode# show | display set | match ip-mon
set chassis cluster redundancy-group 1 ip-monitoring global-weight 210 <<<<< This value is subtracted from threshold when IP monitoring fails and weight is greater than global threshold.
set chassis cluster redundancy-group 1 ip-monitoring global-threshold 110
set chassis cluster redundancy-group 1 ip-monitoring family inet 11.11.11.2 weight 120 <<<<< This weight should exceed "global threshold" (here 110) for IP monitor status to change. So, it makes sense to have weight greater than global-threshold when monitoring single IP.
set chassis cluster redundancy-group 1 ip-monitoring family inet 11.11.11.2 interface reth1.0 secondary-ip-address 11.11.11.100

IP Monitoring status down:

{primary:node0}[edit]
root@primarynode# run show chassis cluster ip-monitoring status
node0:
--------------------------------------------------------------------------
Redundancy group: 1

IP address Status Failure count Reason
11.11.11.2 unreachable 3 unknown

node1:
--------------------------------------------------------------------------
Redundancy group: 1

IP address Status Failure count Reason
11.11.11.2 unreachable 4 no route to host

To Check Threshold Value:

{primary:node0}[edit]
root@primarynode# run show chassis cluster information detail
node0:
--------------------------------------------------------------------------
Redundancy mode:
Configured mode: active-active
Operational mode: active-active

Redundancy group: 0, Threshold: 255, Monitoring failures: none
Events:
Sep 13 15:56:04.223 : hold->secondary, reason: Hold timer expired
Sep 13 15:56:59.408 : secondary->primary, reason: Control & Fabric links down

Redundancy group: 1, Threshold: 45, Monitoring failures: ip-monitoring <<<<< Threshold is 255 when all links in reth are up, when ip-monitoring fails, global-weight defined in interface-monitoring is subtracted from threshold. Note that this threshold value is also affected by interface-monitoring status and configuration. When threshold=0, priority for node=0.

To check Priority value:

{primary:node0}[edit]
root@primarynode# run show chassis cluster status
Cluster ID: 10
Node Priority Status Preempt Manual failover

Redundancy group: 0 , Failover count: 0
node0 200 primary no no
node1 10 secondary no no

Redundancy group: 1 , Failover count: 0
node0 200 primary yes no
node1 10 secondary yes no

Note: In the example above, the priority is unaffected because the threshold equals 45, which is not equal to 0. As a result, failover is not triggered.


Case 2: global-weight is equal to threshold value, and no interface-monitor failure.

If global-weight is changed to 255, it is possible to see how ip-monitoring failure alone can affect priority on the node:

{primary:node0}[edit]
root@primarynode# ...r redundancy-group 1 ip-monitoring global-weight 255

{primary:node0}[edit]
root@primarynode# run show chassis cluster status
Cluster ID: 10
Node Priority Status Preempt Manual failover

Redundancy group: 0 , Failover count: 0
node0 200 primary no no
node1 10 secondary no no

Redundancy group: 1 , Failover count: 43
node0 0 primary yes no <<<< This is because IP is unreachable for both the nodes as per ip-monitoring.
node1 0 secondary yes no <<<< This is because IP is unreachable for both the nodes as per ip-monitoring.

{primary:node0}[edit]
root@primarynode# run show chassis cluster ip-monitoring status
node0:
--------------------------------------------------------------------------
Redundancy group: 1

IP address Status Failure count Reason
11.11.11.2 unreachable 3 unknown

node1:
--------------------------------------------------------------------------
Redundancy group: 1

IP address Status Failure count Reason
11.11.11.2 unreachable 4 no route to host

{primary:node0}[edit]
root@primarynode# run show chassis cluster information detail
node0:
--------------------------------------------------------------------------
Redundancy mode:
Configured mode: active-active
Operational mode: active-active

Redundancy group: 0, Threshold: 255, Monitoring failures: none
Events:
Sep 13 15:56:04.223 : hold->secondary, reason: Hold timer expired
Sep 13 15:56:59.408 : secondary->primary, reason: Control & Fabric links down


Redundancy group: 1, Threshold: 0, Monitoring failures: ip-monitoring <<<<< Threshold is 0, so priority in cluster changed to 0.
Events:
Sep 13 17:15:26.582 : secondary->primary, reason: Remote node is in secondary hol
Sep 13 17:15:32.671 : primary->secondary-hold, reason: Monitor failed: IF IP
Sep 13 17:15:33.683 : secondary-hold->secondary, reason: Back to back failover interval
Sep 13 17:15:35.053 : secondary->primary, reason: Remote node is in secondary hol


Case 3: global-weight is less than 250, and interface-monitoring failure.

Change in configuration:

{primary:node0}[edit]
root@primarynode# show | display set | match ip-mon
set chassis cluster redundancy-group 1 ip-monitoring global-weight 210 <<<< Change to 210 in configuration.

Before ip-monitoring failure for global-weight 210:
{primary:node0}[edit]
root@primarynode# run show chassis cluster status
Cluster ID: 10
Node Priority Status Preempt Manual failover

Redundancy group: 0 , Failover count: 0
node0 200 primary no no
node1 10 secondary no no

Redundancy group: 1 , Failover count: 0
node0 200 primary yes no
node1 10 secondary yes no

{primary:node0}[edit]
root@primarynode# run show chassis cluster information detail
node0:
--------------------------------------------------------------------------
Redundancy mode:
Configured mode: active-active
Operational mode: active-active

Redundancy group: 0, Threshold: 255, Monitoring failures: none
Events:
Sep 13 15:56:04.223 : hold->secondary, reason: Hold timer expired
Sep 13 15:56:59.408 : secondary->primary, reason: Control & Fabric links down

Redundancy group: 1, Threshold: 127, Monitoring failures: interface-monitoring <<<<< Threshold is 127 because an interface is down as per interface-monitoring.
Events:
Sep 13 17:15:35.053 : secondary->primary, reason: Remote node is in secondary hol
Sep 13 17:15:41.268 : primary->secondary-hold, reason: Monitor failed: IF IP
Sep 13 17:15:42.274 : secondary-hold->secondary, reason: Back to back failover interval
Sep 13 17:17:54.614 : secondary->primary, reason: Remote node is in secondary hol
---(more)---[abort]

To check interface-monitor status:
{primary:node0}[edit]
root@primarynode# run show chassis cluster interfaces
Control link status: Up

Control interfaces:
Index Interface Status
0 fxp1 Up

Fabric link status: Up

Fabric interfaces:
Name Child-interface Status
fab0 fe-0/0/5 Up
fab0
fab1 fe-2/0/5 Up
fab1

Redundant-ethernet Information:
Name Status Redundancy-group
reth0 Up 2
reth1 Up 1

Interface Monitoring:
Interface Weight Status Redundancy-group
fe-0/0/3 128 Down 1 <<<<<<<<<<<
fe-2/0/3 128 Up 1
fe-2/0/2 128 Up 1
fe-0/0/2 128 Up 1

In the above situation, if ip-monitoring fails for a global-weight of 210 or less than 255, the failure can still cause a failover, as shown below:

{primary:node0}[edit]
root@primarynode# run show chassis cluster status
Cluster ID: 10
Node Priority Status Preempt Manual failover

Redundancy group: 0 , Failover count: 0
node0 200 primary no no
node1 10 secondary no no

Redundancy group: 1 , Failover count: 1
node0 0 secondary yes no <<<<< Because of IP and interface-monitor failure on node0 for RG1, threshold=0 node1 is primary.
node1 10 primary yes no <<<<< The priority for node1 did not change to zero because threshold=45 for node1.

The outputs below show
{primary:node0}[edit]
root@primarynode# run show chassis cluster information detail
node0:
--------------------------------------------------------------------------
Redundancy mode:
Configured mode: active-active
Operational mode: active-active

Redundancy group: 0, Threshold: 255, Monitoring failures: none
Events:
Sep 13 15:56:04.223 : hold->secondary, reason: Hold timer expired
Sep 13 15:56:59.408 : secondary->primary, reason: Control & Fabric links down

Redundancy group: 1, Threshold: -83, Monitoring failures: interface-monitoring, ip-monitoring <<<<< Threshold is 127-210 = -83.
Events:
Sep 13 17:15:42.274 : secondary-hold->secondary, reason: Back to back failover interval
Sep 13 17:17:54.614 : secondary->primary, reason: Remote node is in secondary hol
Sep 13 17:39:05.064 : primary->secondary-hold, reason: Monitor failed: IF IP
Sep 13 17:39:06.076 : secondary-hold->secondary, reason: Back to back failover interval

Note: The example below clarifies the logical interface status when the link is down with and without interface-monitoring enabled for child links.

Without the interface-monitoring configuration: If child links in a reth on a node go down, the reth interface will go down too, as shown below:

{primary:node0}[edit]
root@primarynode# run show chassis cluster interfaces
Control link status: Up

Control interfaces:
Index Interface Status
0 fxp1 Up

Fabric link status: Up

Fabric interfaces:
Name Child-interface Status
fab0 fe-0/0/5 Up
fab0
fab1 fe-2/0/5 Up
fab1

Redundant-ethernet Information:
Name Status Redundancy-group
reth0 Up 2
reth1 Down 1 <<<<< reth is down and reth1 on Node0 continues being used, but the chassis cluster status does not change.

The priority will stay at configured values:

{primary:node0}[edit]
root@primarynode# run show chassis cluster status
Cluster ID: 10
Node Priority Status Preempt Manual failover

Redundancy group: 0 , Failover count: 0
node0 200 primary no no
node1 10 secondary no no

Redundancy group: 1 , Failover count: 2
node0 200 primary yes no
node1 10 secondary yes no

With interface-monitoring enabled:

{primary:node0}[edit]
root@primarynode# run show chassis cluster status
Cluster ID: 10
Node Priority Status Preempt Manual failover

Redundancy group: 0 , Failover count: 1
node0 200 primary no no
node1 10 secondary no no

Redundancy group: 1 , Failover count: 2
node0 0 secondary yes no
node1 10 primary yes no <<<<< Failover is triggered for the corresponding RG.


{primary:node0}[edit]
root@primarynode# run show chassis cluster interfaces
Control link status: Up

Control interfaces:
Index Interface Status
0 fxp1 Up

Fabric link status: Up

Fabric interfaces:
Name Child-interface Status
fab0 fe-0/0/5 Up
fab0
fab1 fe-2/0/5 Up
fab1

Redundant-ethernet Information:
Name Status Redundancy-group
reth0 Up 2
reth1 Up 1 <<<<< reth1 is up because the child links of reth1 on node1 are now being used.

Interface Monitoring:
Interface Weight Status Redundancy-group
fe-0/0/3 128 Down 1 <<<<< Child links are down.
fe-2/0/3 128 Up 1
fe-2/0/2 128 Up 1
fe-0/0/2 128 Down 1 <<<<< Child links are down.

Related Links

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search