Support Support Downloads Knowledge Base Juniper Support Portal Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

Resolution Guide - SRX - Troubleshooting steps when the Chassis Cluster does not come up



Article ID: KB20641 KB Last Updated: 29 Jun 2020Version: 13.0

This article contains step-by-step troubleshooting procedures to resolve when a node in a Chassis Cluster is in a Hold or Disabled state.  This article is part of the Resolution Guides and Articles - SRX - High Availability (Chassis Cluster).



  • Chassis Cluster is not coming up
  • Chassis Cluster is not in Primary/Secondary State
  • Both HA (High Availability) members are in Primary State
  • One HA member is in Hold or Disable or Secondary-Hold State
If your Chassis Cluster is up and running, but you want to simply verify that it is in a healthy state, please refer to  KB15439 - Verify Chassis Cluster is in healthy state or Verifying the Chassis Cluster Configuration.

Perform the following steps to troubleshoot your Chassis Cluster.


step1  Are you configuring the Chassis Cluster for the first time?

  • Yes - Continue with Step 2
  • No   - Proceed to Step 3 to begin troubleshooting
    (Selecting No means that the Chassis Cluster was previously up, and it went down due to reasons which need to be investigated.)

step2  As a new configuration, check the following to make sure basic configuration guidelines are being followed:

  1. Confirm the Hardware and Software requirements for your Chassis Cluster. Refer to the following articles to make sure the basic software and hardware requirements are satisfied in your Chassis Cluster scenario:
    KB16141 - Minimum hardware and software requirements for a Chassis Cluster
    KB15425 - Are licenses needed for each node of a Chassis Cluster?
  2. Make sure that the cabling is correct, and the Control and Fabric links are up. Direct connections between the fabric and control links are recommended.
    //sample output showing the control and fabric links as up
    root@J-SRX> show interfaces terse | match fxp 
    fxp0                    up    up  
    fxp0.0                  up    up   inet     
    fxp1                    up    up  
    fxp1.0                  up    up   inet    
    fxp2                    up    up  
    fxp2.0                  up    up   tnp      0x1100001
    root@J-SRX> show interfaces terse | match fab 
    ge-0/0/2.0              up    up   aenet    --> fab0.0
    ge-9/0/2.0              up    up   aenet    --> fab1.0
    fab0                    up    up  
    fab0.0                  up    up   inet  
    fab1                    up    up  
    fab1.0                  up    up   inet  

    Note: The Control and Fabric links differ with the hardware platforms. Make sure that the correct ports are used for connecting the Control and Fabric links.

    If you find that the Control or Fabric links are showing down, refer to the following articles to troubleshoot this issue further:

  3. Confirm the Chassis Cluster configuration.  Refer to KB15439 - How do I verify chassis cluster nodes are configured and up on SRX.

  4. Reboot both Chassis Cluster nodes simultaneously. This should ensure a clean cluster state. If the issue is still not resolved, proceed to Step 3.

step3  Run the command 'show chassis cluster status' to check the current status of the Chassis Cluster:

root@J-SRX> show chassis cluster status 
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
    node0                   100         secondary      no       no  
    node1                   1           primary        no       no  

Redundancy group: 1 , Failover count: 1
    node0                   100         secondary      no       no  
    node1                   1           primary        no       no  
  1. Do you see a Cluster ID for the Chassis Cluster output (as shown above in blue)?

  2. Do you see both node0 and node1 in the output (as shown above in blue)?

    • Yes - Proceed with Step 6.
    • No   - Proceed with Step 4.

step4  If you do not see both the nodes in the cluster status output (as shown in Step 3), it could mean that the hardware/software components are different on both nodes. Are the components the same for both nodes? 

        Make sure that the hardware components on both devices remain the same, the software versions are the same, and the  interfaces being used as part of reth are logically the same.


step5  Is the Cluster ID the same on both nodes?  In order to check the Cluster ID value, connect a console to both nodes. Run the command 'show chassis cluster status'.

  • No   - Set the Cluster ID on both nodes to the same value. This is a requirement.  See note below. Then reboot both devices simultaneously. If this does not resolve the issue, go to Step 7.

    Note: If you have more than one Chassis Cluster on the same switch or L2 domain, then each pair of Chassis Cluster nodes must have a different Cluster ID. For example, if there are two pairs of Chassis Cluster nodes connected to the same switch -- Juniper_mktg (node0 and node1) is one pair, and Juniper_eng (node 0 and node1) is another pair. Juniper_mktg node0 and node1 may be assigned the Cluster ID of 1. Juniper_mktg node0 and node1 may be assigned the Cluster ID of 2; Juniper_mktg should not be assigned a Cluster ID of 1 because the other pair is using 1. This is because the reth MAC addresses are calculated based on the cluster IDs and two similar cluster IDs in the same network might cause a network impact due to overlapping virtual MAC entries.

  • Yes - Run the command show chassis cluster interfaces.  Are the Control and Fabric link status Up (as shown in blue below)?
    root@J-SRX> show chassis cluster interfaces
    Control link 0 name: fxp1
    Control link status: Up

    Fabric interfaces:
    Name Child-interface Status
    fab0 ge-0/0/2 up
    fab1 ge-9/0/2 up
    Fabric link status: Up
    Redundant-ethernet Information:     
        Name         Status      Redundancy-group
        reth0        Down        1                
        reth1        Down        1                
        reth2        Down        Not configured                
        reth3        Down        Not configured                              
    Interface Monitoring:
        Interface         Weight    Status    Redundancy-group
        ge-2/0/1          255       Down      1  
        ge-11/0/1         255       Up        1   
        ge-11/0/0         255       Down      1   
        ge-2/0/0          255       Down      1   
    If the Control and Fabric Link status is not Up, refer to the following articles to troubleshoot this issue:
    KB20687 - How to troubleshoot a Fabric Link that is down on a Chassis Cluster
    KB20698 - How to troubleshoot a Control Link that is down on a Chassis Cluster
    If the Control and Fabric Link shows as up, proceed to Step 1, to rework on narrowing down the issue. Further goto Step 7 to open a case with JTAC

step6  What is the current state of the Chassis Cluster?  Proceed to the next troubleshooting steps based on the state that you see for node 0 and node 1 respectively.

  1. Primary/Secondary -> This is the expected state for a healthy cluster. Proceed to KB20673 - How to verify that Chassis Cluster in Primary/Secondary State has proper priority. This is the final check for a healthy cluster state.
  2. Primary/Lost -> Proceed to KB20672 - Troubleshooting steps if the Chassis Cluster in Primary/Lost State.

  3. Primary/Hold -> The reason could be because the JSRP daemon is stuck on one of the nodes. Either simultaneously reboot both devices, or open a case with your technical support representative. Consult KB21781 - [SRX] Data Collection Checklist - Logs/data to collect for troubleshooting.
  4. Hold/Lost -> Refer to KB27713 How to recover or prevent a chassis cluster from going into a Hold/Lost state

  5. Primary/Disabled -> Proceed to KB20697 - Troubleshooting steps if the Chassis Cluster is in Primary/Disabled State.
  6. Primary or Secondary in Hold state -> This could be a temporary behavior. Check the output of  the command chassis fpc pic status. The available PICs on both nodes should show as online. Wait for some time for the PICs to come online on both nodes, and the status should change to Primary/Secondary. If the situation does not improve, please proceed to Step 7


step7  If you want to determine the cause of a failover, refer to KB21164 - [SRX] Finding out possible reasons for Chassis Cluster failover.

If the above steps do not resolve this problem, refer to KB21781 - [SRX] Data Collection Checklist - Logs/data to collect for troubleshooting in order to collect the necessary logs from both devices, and open a case with with your technical support representative.

Modification History:
2020-06-29: Removed J-Series references.

Related Links

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search