Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[ScreenOS] The NSRP Master is bouncing between the 'primary' and 'secondary' cluster members

0

0

Article ID: KB11388 KB Last Updated: 28 Aug 2020Version: 6.0
Summary:

The NSRP cluster is continually failing over, the Master is continually changing between the cluster members. Why is this happening?

Symptoms:

There are a number of reasons why a cluster may failover, but for it to failover continually is unusual. There are a couple of possibilities for this type of behavior:

  • An ongoing condition causing the current Master to crash. 
    One sequence for this would be:
     
    1. The Master (primary) experiences an unexpected condition and crashes; therefore, the Backup (secondary) takes over.
    2. After a period, the secondary (as Master) also crashes, at which time the primary (Backup) is already recovered and again becomes Master. If the trigger condition continues to exist, the devices will continue to crash.

    If NSRP preempt is enabled, then as a slight variation to the sequence above, it is possible only the primary device is experiencing the problem. The sequence for this would be:
    1. The Master (primary) crashes; therefore, the Backup (secondary) takes over.
    2. The primary recovers and preempt causes it to resume the Master role.
    3. The trigger condition recurs causing Master (primary) to crash again.

    Troubleshooting steps for both scenarios are the same and are outlined in the solution section below.

  • NSRP monitoring/tracking is intermittently causing only one device to become "Inoperable".
    For tracking to cause the cluster to be bouncing back and forth then the trigger would likely need to be a flapping event that affects only one box (else they would both be Inoperable), and NSRP preempt would also be enabled. Possible causes for this could include the following:
     
    • Physical link unique to one of the firewalls is flapping
    • Flapping path to a tracked host is affecting only one firewall
    • Failing host is being tracked by only one firewall
    • Common interface or host is flapping, but the NSRP weightings or thresholds are mismatched
Solution:

If the cluster devices are crashing, then consult the following link; it contains information on how to confirm / test the issue and the next steps towards correcting the situation:

If NSRP monitoring/tracking is causing the bounces, then check the following:

Check if both devices are monitoring the same objects (interfaces, hosts). Also check weight and threshold values between the devices.

Note: Monitoring and Track-ip settings (objects, weights, thresholds) are not synchronized by NSRP, it is possible only one of the members is tracking a failing host.
ns208(M)-> get config | incl nsrp
set nsrp cluster id 7
set nsrp vsd-group id 0 priority 100
set nsrp monitor interface ethernet5 weight 200
set nsrp monitor track-ip ip
set nsrp monitor track-ip ip 172.27.18.180

Check the Event Log for NSRP tracking or interface events.
ns208(M)-> get event incl track
Date       Time     Module Level  Type Description
2008-04-12 02:30:17 system crit  00062 Track IP IP address 172.27.18.180
                                       failed.

2008-04-12 02:30:16 system crit  00062 No interface/route enables the Track
                                       IP IP address 172.27.18.180 to be
                                       transmitted.
2008-04-12 02:30:15 system notif 00050 Track IP IP address 172.27.18.180
                                       added with an interval of 1 seconds, a
                                       threshold of 3, a weight of 1 on
                                       interface auto using method ping.
2008-04-12 02:30:07 system notif 00050 Track IP enabled
Total entries matched = 4

Check the NSRP monitoring status on both boxes (compare the weights and total values). 
ns208(M)-> get nsrp monitor
device based nsrp monitoring threshold: 255, weighted sum: 200, not failed
device based nsrp monitor interface: ethernet5(weight 200, DOWN)
device based nsrp monitor zone:
device based nsrp track ip: (weight: 255, enabled, not failed)

For more information on determining what triggered the firewall to the inoperable state, refer to KB11338 - Firewall running NSRP is in the (I) Inoperable state. How do I check what triggered it to this state and how do I fix it?.

Modification History:
2020-08-27: Minor, non-technical edits.

Related Links

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search