Support Support Downloads Knowledge Base Juniper Support Portal Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[SRX] Troubleshooting steps if the Chassis Cluster is in Primary/Lost State

0

0

Article ID: KB20672 KB Last Updated: 29 Jun 2020Version: 6.0
Summary:

The goal of this article is to troubleshoot a Chassis Cluster that is in a Primary/Lost State and bring it up into the Primary/Secondary state (which is a healthy state).

Symptoms:
  • Chassis Cluster is not working. It is in Primary/Lost state
  • One member is in the Primary state and the other member is in the Lost state.
Solution:

step 1  Run the command show chassis cluster status on either node to verify the Chassis Cluster status:

{primary:node0}
root@SRX> show chassis cluster status 
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
    node0                   100         primary        no       no  
    node1                   0           lost           no       no  

Redundancy group: 1 , Failover count: 1
    node0                   100         primary        no       no  
    node1                   0           lost           no       no

Do you see one node with the status of primary and one node with the status of lost?

 

step 2  Note the node that is in lost state. In this example, node1 is in lost state.  Is the node that is in the lost state powered on?

  • Yes - Both nodes are powered on. Proceed to Step 4.
  • No   - Power on the device (node that is in lost state) and proceed to Step 3.


step 3  Once both nodes are powered up, run the show chassis cluster status command again.

Also check that all the FPC's are showing online before proceeding to check the above command.

root@SRX> show chassis fpc pic-status ## All pics should show as online

Do you still see the node in the lost state?


step 4  Are you able to access the node shown as lost via a console port?   (i.e. not telnet or SSH, but via the console port)

  • Yes - Continue with Step 5.
  • No -  If the device is at a remote location, get access to the node via the console for further troubleshooting. If you have console access, but do not see any output, it could indicate that the underlying hardware is having issues. Open a case with your technical support representative for further troubleshooting; refer to KB21781 - [SRX] Data Collection Checklist - Logs/data to collect for troubleshooting.


step 5  On the other node (i.e. the one that is not showing as lost) connect a console and run the command show chassis cluster status.  Does the Cluster Status also that node as primary and the other as lost?

  • Yes - This could be a split-brain scenario. Isolate the device shown as lost from the network by removing all cables except the Control and Fabric links. Proceed with Step 6.
  • No -   Proceed with Step 6.


step 6 Is this node a replacement unit?


step 7 Do you have a switch connected in between the nodes acting as a cluster?

 

step 8 Create a backup of the configuration from the node that is currently primary, and copy it to the node that was in the lost state as follows:

Once you have the backup copy of the configuration from the primary device, proceed to loading this configuration on the node that is showing as lost.

When you console to the lost node, you may see the state as either primary or hold/disabled. The later state comes into play only when there was a Fabric link failure before the device went into actual lost state. For troubleshooting this state, proceed to KB20687 - How to troubleshoot a Fabric Link that is down on a Chassis Cluster
{primary:node1}[edit]
root@SRX# load override <terminal or filename>


If you are using the "terminal" statement, paste the complete configuration onto the window. Make sure that you use the "Cntrl D" at the end of the configuration. 
If you are using the "filename" statement, provide the path for the configuration file and press enter. (eg. /var/tmp/Primar_saved.conf)


Once this new configuration is loaded, commit the changes. If problem persists then also replace the exisiting Control and/or Fabric links on this device with a new cable and reboot the node using the following command:

root@SRX> request system reboot

If you still find the issue is persistent after the reboot, proceed to Step 9.
 

  Collect necessary logs as specified in KB21781- [SRX] Data Collection Checklist - Logs/data to collect for troubleshooting and open case with your technical support representative. 
 

Modification History:
2020-06-29: Removed J-Series references.
Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search