Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[vSRX] Creating chassis cluster fails due to configuration sync error

0

0

Article ID: KB34771 KB Last Updated: 23 Jul 2019Version: 1.0
Summary:

This article addresses the issue where vSRX chassis cluster cannot form due to a configuration sync error in the 18.2 release.

Symptoms:

Standalone vSRX with full configuration includes routes, interfaces and system.  Once converted to a cluster, it boots up with multiple error messages and one of the nodes become disabled due to aconfiguration sync error. 

Example error messages:

Jul  3 18:42:16   last message repeated 5 times
Jul  3 18:42:20   chassisd[9747]: LICENSE_SOCKET_FAILURE: 'evConnect' failed for socket 35:
Jul  3 18:42:21   gkmd[6735]: IKED-PKID socket creation failed
Jul  3 18:42:26   gkmd[6735]: IKED-PKID socket creation failed
Jul  3 18:42:28   lsysd[6852]: LICENSE_SOCKET_FAILURE: 'evConnect' failed for socket 9:
Jul  3 18:42:29   authd[6794]: LICENSE_SOCKET_FAILURE: 'evConnect' failed for socket 10:
Jul  3 18:42:30   kmd[6839]: LICENSE_SOCKET_FAILURE: 'evConnect' failed for socket 26:
Jul  3 18:42:31   gkmd[6735]: IKED-PKID socket creation failed
Jul  3 18:42:51   last message repeated 4 times
Jul  3 18:42:52   chassisd[9747]: CHASSISD_IPC_CONNECTION_DROPPED: Dropped IPC connection for LCC 1
Jul  3 18:42:53   jlaunchd: can not access /usr/sbin/pgmd: No such file or directory
Jul  3 18:42:53   jlaunchd: can not access /usr/sbin/hostname-cached: No such file or directory
Jul  3 18:42:53   jlaunchd: can not access /usr/sbin/tad: No such file or directory
Jul  3 18:42:54   chassisd[9890]: System memory 977547264 bytes is too low to support!
Jul  3 18:42:54   chassisd[9890]: LICENSE_SOCKET_FAILURE: 'evConnect' failed for socket 35:
Jul  3 18:42:54   kernel: rts_ifstate_chk_multi_registration: daemon chassisd(9890) has previously registered 1 time(s)
Jul  3 18:42:54   kernel: rts_ifstate_chk_multi_registration: daemon chassisd(9890) has previously registered 2 time(s)
Jul  3 18:42:55   cli: login_getclass: unknown class 'j-idle-timeout'
Jul  3 18:42:55   last message repeated 3 times
Jul  3 18:42:56   gkmd[6735]: IKED-PKID socket creation failed
  1. Cluster status shows one node is disabled due to CF (config sync failed):

    root# run show chassis cluster status
    
    Monitor Failure codes:
        CS  Cold Sync monitoring        FL  Fabric Connection monitoring
        GR  GRES monitoring             HW  Hardware monitoring
        IF  Interface monitoring        IP  IP monitoring
        LB  Loopback monitoring         MB  Mbuf monitoring
        NH  Nexthop monitoring          NP  NPC monitoring
        SP  SPU monitoring              SM  Schedule monitoring
        CF  Config Sync monitoring      RE  Relinquish monitoring
    
    Cluster ID: 1
    Node   Priority Status               Preempt Manual   Monitor-failures
    
    Redundancy group: 0 , Failover count: 1
    node0  1        primary              no      no       None
    node1  0        disabled             no      no       CF . <-- node 1 is disabled and failure reason is CF
    
    {primary:node0}[edit]
  2. Control link is up and inbound/outbound heartbeat packets are incrementing. 

    root# run show chassis cluster control-plane statistics
    Control link statistics:
        Control link 0:
            Heartbeat packets sent: 1958
            Heartbeat packets received: 1940
            Heartbeat packet errors: 0
    Fabric link statistics:
        Child link 0
            Probes sent: 0
            Probes received: 0
        Child link 1
            Probes sent: 0
            Probes received: 0
    
    root# run show chassis cluster interfaces
    Control link status: Up
    
    Control interfaces:
        Index   Interface   Monitored-Status   Internal-SA   Security
        0       em0         Up                 Disabled      Disabled
    
    Fabric link status: Down
    
    Fabric interfaces:
        Name    Child-interface    Status                    Security
                                   (Physical/Monitored)
        fab0
        fab0
        fab1
        fab1
    
    Redundant-pseudo-interface Information:
        Name         Status      Redundancy-group
        lo0          Up          0
  3. Configuration sync shows lots of errors:

    root# run show chassis cluster information configuration-synchronization
    node0:
    --------------------------------------------------------------------------
    
    Configuration Synchronization:
        Status:
            Activation status: Enabled
            Last sync operation: Auto-Sync
            Last sync result: Not needed
            Last sync mgd messages:
    
        Events:
            Jul  3 18:22:20.794 : Auto-Sync: Not needed.
    
    node1:
    --------------------------------------------------------------------------
    
    Configuration Synchronization:
        Status:
            Activation status: Enabled
            Last sync operation: Auto-Sync
            Last sync result: Failed
            Last sync mgd messages:
                mgd: rcp: /config/juniper.conf: No such file or directory
                <xnm:error xmlns="http://xml.juniper.net/xnm/1.1/xnm" xmlns:xnm="http://xml.juniper.net/xnm/1.1/xnm">
                <message>
                failed to copy file 'node0://var/etc/policy.id' from 'node0'
                </message>
                </xnm:error>
                <xnm:error xmlns="http://xml.juniper.net/xnm/1.1/xnm" xmlns:xnm="http://xml.juniper.net/xnm/1.1/xnm">
                <message>
                failed to copy file 'node0://var/etc/captive_portal.id' from 'node0'
                </message>
                </xnm:error>
                <xnm:error xmlns="http://xml.juniper.net/xnm/1.1/xnm" xmlns:xnm="http://xml.juniper.net/xnm/1.1/xnm">
                <message>
                failed to copy file 'node0://var/etc/vpn_tunnel.id' from 'node0'
                </message>
    <--snip-->
  4. Rebooting one node or both nodes does not resolve the issue.

Cause:

This is due to a configuration conflict on both nodes.

Solution:

To resolve this issue, disable the cluster. Then load the factory default configuration on both nodes and re-form the cluster. 

  1. Access via console connection. Disable cluster on both nodes and reboot.
    root> set chassis cluster disable reboot
  2. Before loading factory default configuration, backup the current one on your local machine:

    Entering configuration mode:
    [edit]
    root@vSRX-6# load factory-default
    warning: activating factory configuration
    
    [edit]
    root@vSRX-6# commit
    [edit]
      'system'
        Missing mandatory statement: 'root-authentication'
    error: commit failed: (missing mandatory statements)
    
    [edit]
    root@vSRX-6# set system root-authentication plain-text-password
    New password:
    Retype new password:
    
    [edit]
    root@vSRX-6# commit
    commit complete
  3. Re-form cluster on both nodes:

    root@vSRX-5>set chassis cluster cluster-id 1 node 0 reboot
    root@vSRX-6>set chassis cluster cluster-id 1 node 1 reboot
  4. After 2 nodes boot up, it will form the cluster correctly:

    root> show chassis cluster status
    Monitor Failure codes:
        CS  Cold Sync monitoring        FL  Fabric Connection monitoring
        GR  GRES monitoring             HW  Hardware monitoring
        IF  Interface monitoring        IP  IP monitoring
        LB  Loopback monitoring         MB  Mbuf monitoring
        NH  Nexthop monitoring          NP  NPC monitoring
        SP  SPU monitoring              SM  Schedule monitoring
        CF  Config Sync monitoring      RE  Relinquish monitoring
    
    Cluster ID: 1
    Node   Priority Status               Preempt Manual   Monitor-failures
    
    Redundancy group: 0 , Failover count: 0
    node0  1        secondary            no      no       None
    node1  1        primary              no      no       None
Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search