Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[ScreenOS] What is NSRP split brain? Why are both of my firewalls in the Master state?

0

0

Article ID: KB11450 KB Last Updated: 11 Sep 2020Version: 10.0
Summary:

This article provides information about the NSRP split brain scenario.

Symptoms:
  • Both of the firewalls are the Master or in Split brain state (Active/Passive firewalls using VSD 0).
  • Traffic is not passing through the active/passive firewall that is configured with NSRP.

If both the Active/Passive firewalls are the Master at the same time, they could be in split-brain ​state. To confirm this, use the get nsrp command. If PB ​is shown as none then they are in split brain state.

SSG550 (M)-> get nsrp
(snip)
master always exist: disabled
group priority preempt holddown inelig   master       PB other members
    0       90 no             3 no       myself     none 
total number of vsd groups: 1
Notes:
  • If the VSD group is 1 or other non-zero number (highlighted above) and both of the firewalls are the Master, then the firewalls may not be in the split-brain mode.
  • The NSRP cluster id (shown at the top of get nsrp) must be the same value for both Firewall-A and Firewall-B.  If the cluster ID values are different, then both of the firewalls will also be in the Master state. Correct this by making the NSRP cluster ID the same via the set nsrp cluster id <value> command.
Solution:

In NSRP Active/Passive mode, both firewalls are configured with a single VSD group. So one of the firewalls is the Master or Active of the VSD group while the other one is Slave or Passive. Whereas in NSRP Active/Active mode, each firewall is configured with 2 VSD groups. So each of the firewalls is the Master for one of the VSD-groups and Slave for the other VSD group.

When firewalls running NSRP lose connectivity to each other, both firewalls may become the Master of the same VSD Group at the same time. This condition is known as split-brain. Split brain is a highly undesirable condition as it may cause intermittent or complete outage of traffic flow.

To resolve the split brain issue, you must ensure that at least one HA link connection is restored, which will allow the exchange of NSRP hello messages or heartbeats.

If the HA links are directly connected, perform the following procedure:

  1. Issue the command "get nsrp ha-link":

    NSPROD1(M)-> get nsrp ha-link
    total_ha_port = 2
    probe on ha-link is disabled
    unused    channel: ethernet8 (ifnum: 11)  mac: 0010db1d1e8b state: down
    unused    channel: ethernet7 (ifnum: 10)  mac: 0010db1d1e8a state: down
    ha control link not available
    ha data link not available
    ha secondary path link not available

    In the above output, both HA links - ethernet7 and ethernet8 are down. This is the typical scenario where split brain happens.

  2. Issue the command "get nsrp" on both the master and backup.  Note that both firewalls only see themselves as the master, and they do not have a PB (Primary Backup):

    NSPROD1(M)-> get nsrp
    (snip)

    group priority preempt holddown inelig   master       PB other members
        0      115 no             3 no       myself     none
    total number of vsd groups: 1


    NSPROD2(M)-> get nsrp
    (snip)

    group priority preempt holddown inelig   master       PB other members
        0      120 no             3 no       myself     none
    total number of vsd groups: 1
  3. To correct this, at least one of the HA links must restored. Check the cables. As soon as the HA link ethernet7 is restored, it becomes the NSRP control channel and the split brain is corrected. The firewall with the lower priority becomes the master while the firewall with the higher priority becomes the slave.

    NSPROD1(M)-> get nsrp h
    total_ha_port = 2
    probe on ha-link is disabled
    unused    channel: ethernet8 (ifnum: 11)  mac: 0010db1d1e8b state: down

    control   channel: ethernet7 (ifnum: 10)  mac: 0010db1d1e8a state: up
    ha data link not available
    ha secondary path link not available


    NSPROD1(M)-> get nsrp
    (snip)

    group priority preempt holddown inelig   master       PB other members
        0      115 no             3 no       myself  6666496
    total number of vsd groups: 1


    NSPROD2(B)-> get nsrp
    (snip)

    group priority preempt holddown inelig   master       PB other members
        0      120 no             3 no      6666880   myself    
    total number of vsd groups: 1
  4. Alternately, a secondary HA path can be configured on BOTH firewalls.

    NSPROD1(M)-> set nsrp secondary-path e1
    NSPROD1(M)-> get nsrp ha-link
    total_ha_port = 2
    probe on ha-link is disabled
    unused    channel: ethernet8 (ifnum: 11)  mac: 0010db1d1e8b state: down
    unused    channel: ethernet7 (ifnum: 10)  mac: 0010db1d1e8a state: down
    secondary path channel: ethernet1 (ifnum: 0)  mac: 0010db1d1e80 state: up
    ha control link not available
    ha data link not available

If the HA links are connected through a layer 2 switch, follow the steps below to troubleshoot:

  1. Ensure that the NSRP HA probe feature is enabled.

    NSPROD1(M)-> set nsrp ha-link probe

    NSPROD1(M)-> get nsrp ha-link
    total_ha_port = 2
    probe on ha-link is enabled, interval 1s, threshold 5
    unused    channel: ethernet8 (ifnum: 11)  mac: 0010db1d1e8b state: disconnected(probe)
    unused    channel: ethernet7 (ifnum: 10)  mac: 0010db1d1e8a state: disconnected(probe)
    secondary path channel: ethernet1 (ifnum: 0)  mac: 0010db1d1e80 state: up(probe)
    ha control link not available
    ha data link not available

    Note: In addition, confirm the duplex settings on the switch and firewall interfaces match.  

Steps to prevent split brain:

  1. Configure NSRP with 2 dedicated HA links.

    When both the links are up, one of the links is used for the NSRP control channel while the other one is the data channel (synchronization of RTOs). In the event of a failure of the control link, the data link carries both data and control traffic if the link is 1 Gbps or above. If not, the data link becomes the control link.
  2. Configure the HA probe feature ONLY IF the HA links are connected through a layer 2 switch.

    The HA link probe is a function used for determining the health of a HA link. By default, the physical state of  the HA link is used to determine whether heartbeats should be sent and expected on the link. When the physical state of the  first HA link goes down, NSRP control messages will begin to exchange on the second HA link (assuming one exists). This  assumes that the firewalls are connected back to back, which is not always the case. If there is an intermediate switching layer, sometimes the physical links can remain up, but heartbeats cannot be received. In this scenario, by default, both devices will attempt to become the master (split brain), and connectivity problems will likely result. To address this, the HA link probe adds a logical connectivity test to the HA links so that if such a failure occurs, heartbeat messages first  failover to the second HA link.
  3. Configure a secondary path, which is essentially a third NSRP HA link to be used to elect a VSD-Group master if for some reason both dedicated HA links were to fail.

    The secondary path is different from the standard HA interfaces in that only Hello packets are sent on the secondary path to elect a master; it is meant to prevent split brain and nothing more. Because the secondary path uses a forwarding interface, it is strongly  recommended that message authentication and encryption be performed, as messages will travel over a shared interface. As you would expect, you must perform auth and encrypt settings on each device. The secondary path itself is a forwarding interface which is the failsafe in cases where all HA links are down. The secondary path is not used for synchronization of RTOs, however, and is invoked only after multiple failovers. When invoked, a master is elected, and no RTOs are synchronized until an HA link is restored. In an Active/Passive environment, an HA data link is not required as there should never be any  asymmetric traffic. Nevertheless, it is not a bad idea to have such a link so that you can continue to synchronize sessions in case your primary HA link goes down.
Modification History:
2020-09-11: Minor non-technical edits.

Related Links

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search