Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

Procedure to recover Qfabric DG when it is stuck on CLuster-Init State.

0

0

Article ID: KB34050 KB Last Updated: 21 Mar 2019Version: 1.0
Summary:
Recovering of Qfabric Director when it is stuck in the "Cluster Init" State.
[Cluster Initialization]: Cluster node is shutting down
[Cluster Initialization]: Stopping cluster service operations
[Cluster Initialization]: Stopping compute node services
[Cluster Initialization]: Stopping cluster resources
[Cluster Initialization]: Shutting down the cluster
[Cluster Initialization]: Migrating dcfservice:dcf_svc to dg1
[Cluster Initialization]: Migrating service:pbccif_svc0 to dg1
[Cluster Initialization]: Migrating service:pblb_dhcp_svc0 to dg1
[Cluster Initialization]: Stopping NFS services on local node: dg0
nfsd: last server has exited
nfsd: unexporting all filesystems
[Cluster Initialization]: Forcing stop of CCIF services on local node: dg0
[Cluster Initialization]: Stopping GFS services on local node: dg0
[Cluster Initialization]: Cluster services shut down sucessfully
[Cluster Initialization]: Successful cluster shutdown.
  • This condition can occur when the GFS filesystem has I/O error. And the DG
dg kernel: GFS: fsid=sfc_pb_cluster:pbfs.0: fatal: I/O error
dg kernel: GFS: fsid=sfc_pb_cluster:pbfs.0:   block = 22452641
dg kernel: GFS: fsid=sfc_pb_cluster:pbfs.0:   function = gfs_bd_ail_tryremove
dg kernel: GFS: fsid=sfc_pb_cluster:pbfs.0:   file = /builddir/build/BUILD/gfs-kmod-0.1.34/_kmod_build_/src/gfs/dio.h, line = 184
dg kernel: GFS: fsid=sfc_pb_cluster:pbfs.0:   time = 1552874718
dg kernel: GFS: fsid=sfc_pb_cluster:pbfs.0: about to withdraw from the cluster
dg kernel: GFS: fsid=sfc_pb_cluster:pbfs.0: telling LM to withdraw
dg cn_monitor: Failed to fetch clustered storage available space 
Solution:
  • To Recover the DG, isolate the DG from the Cluster by shutting down the DG. The following command can be used:
[root@dg1 ~]# shutdown -h now
Broadcast message from root (ttyS0) (Wed Mar 20 17:52:16 2019):
The system is going down for system halt NOW!
  • Once the system is fully halted, remove the power cables from the back of the DG and make sure no input goes from the console.
  • Remove all the network cables (CPE) from the DG (make a note of the same which cable is connecting to which device).
  • Once isolated, reboot the DG as a standalone, fully separated from the Qfabric cluster.
     
  • After the reboot the DG will ask for Centos option to choose. Select as highlighted:


 
  • We have to enter Grub menu and edit the current kernel boot options (press ESC to cancel automated boot process, then select the top kernel option and press “e” to edit it) – we will see a line similar to:
 
kernel /vmlinuz-2.6.18-410.el5 ro root=/dev/VolGroup00/LogVol00 console=ttyS0 crashkernel=128M@16M acpi=off reboot=b,w,k,f,p,t quiet
 
  • We have to edit it and add “single” (see below) to the end of the line, then proceed with system boot up – this will bring you to single user mode.
 
  kernel /vmlinuz-2.6.18-410.el5 ro root=/dev/VolGroup00/LogVol00 console=ttyS0 crashkernel=128M@16M acpi=off reboot=b,w,k,f,p,t quiet single
 
  • Once we are in single user mode, you will have the option to do fsck.gfs.
  • “/sbin/fsck.gfs -y  /dev/sda3    <<<<<<This will recover the DG from cluster-init state.
  • Once the recovery is done. Then collect the necessary logs and proceed accordingly.
Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search