Recovering of Qfabric Director when it is stuck in the "Cluster Init" State.
[Cluster Initialization]: Cluster node is shutting down
[Cluster Initialization]: Stopping cluster service operations
[Cluster Initialization]: Stopping compute node services
[Cluster Initialization]: Stopping cluster resources
[Cluster Initialization]: Shutting down the cluster
[Cluster Initialization]: Migrating dcfservice:dcf_svc to dg1
[Cluster Initialization]: Migrating service:pbccif_svc0 to dg1
[Cluster Initialization]: Migrating service:pblb_dhcp_svc0 to dg1
[Cluster Initialization]: Stopping NFS services on local node: dg0
nfsd: last server has exited
nfsd: unexporting all filesystems
[Cluster Initialization]: Forcing stop of CCIF services on local node: dg0
[Cluster Initialization]: Stopping GFS services on local node: dg0
[Cluster Initialization]: Cluster services shut down sucessfully
[Cluster Initialization]: Successful cluster shutdown.
- This condition can occur when the GFS filesystem has I/O error. And the DG
dg kernel: GFS: fsid=sfc_pb_cluster:pbfs.0: fatal: I/O error
dg kernel: GFS: fsid=sfc_pb_cluster:pbfs.0: block = 22452641
dg kernel: GFS: fsid=sfc_pb_cluster:pbfs.0: function = gfs_bd_ail_tryremove
dg kernel: GFS: fsid=sfc_pb_cluster:pbfs.0: file = /builddir/build/BUILD/gfs-kmod-0.1.34/_kmod_build_/src/gfs/dio.h, line = 184
dg kernel: GFS: fsid=sfc_pb_cluster:pbfs.0: time = 1552874718
dg kernel: GFS: fsid=sfc_pb_cluster:pbfs.0: about to withdraw from the cluster
dg kernel: GFS: fsid=sfc_pb_cluster:pbfs.0: telling LM to withdraw
dg cn_monitor: Failed to fetch clustered storage available space
- To Recover the DG, isolate the DG from the Cluster by shutting down the DG. The following command can be used:
[root@dg1 ~]# shutdown -h now
Broadcast message from root (ttyS0) (Wed Mar 20 17:52:16 2019):
The system is going down for system halt NOW!
- Once the system is fully halted, remove the power cables from the back of the DG and make sure no input goes from the console.
- Remove all the network cables (CPE) from the DG (make a note of the same which cable is connecting to which device).
- Once isolated, reboot the DG as a standalone, fully separated from the Qfabric cluster.
- After the reboot the DG will ask for Centos option to choose. Select as highlighted:

- We have to enter Grub menu and edit the current kernel boot options (press ESC to cancel automated boot process, then select the top kernel option and press “e” to edit it) – we will see a line similar to:
kernel /vmlinuz-2.6.18-410.el5 ro root=/dev/VolGroup00/LogVol00 console=ttyS0 crashkernel=128M@16M acpi=off reboot=b,w,k,f,p,t quiet
- We have to edit it and add “single” (see below) to the end of the line, then proceed with system boot up – this will bring you to single user mode.
kernel /vmlinuz-2.6.18-410.el5 ro root=/dev/VolGroup00/LogVol00 console=ttyS0 crashkernel=128M@16M acpi=off reboot=b,w,k,f,p,t quiet
single
- Once we are in single user mode, you will have the option to do fsck.gfs.
- “/sbin/fsck.gfs -y /dev/sda3 <<<<<<This will recover the DG from cluster-init state.
- Once the recovery is done. Then collect the necessary logs and proceed accordingly.