Users who are managing a Junos Space fabric consisting of two or more Space nodes or DB nodes may observe that the database status under Network Management Platform >
Administration >
Fabric is showing as being "Out of Sync."
This article lists some common errors associated with this status message and provides solutions.
Junos Space fabric UI shows "Out of Sync" message and jmp-dr health reports a MySQL replication error.
Some possible causes are given below. To determine the cause, perform the following:
For a system that is showing "Out of Sync"
From Junos Space CLI, use the following command to determine the replication error (this command will also work for DR site DB nodes):
mysql -ujboss -p$(grep mysql.jboss /etc/sysconfig/JunosSpace/pwd | awk -F= '{print $2}') -e "show slave status\G show master status\G"
The key fields to look at are:
Last_IO_Errno:
Last_IO_Error:
Last_SQL_Errno:
Last_SQL_Error:
When there is no error, the database is considered to be in-sync. There is no bit-by-bit data comparison of sync status. The only information you get is that all replication has succeeded.
For a system no longer showing "Out of Sync"
Review /var/chroot/mysql/var/log/mysqld.log
from both DB nodes.
Solutions can vary depending on the error found. The "Reset MySQL Replication" button will almost always resolve the replication errors.
Some common errors and solutions
This error will be seen only on an old Space installation. It means that the certificate has expired. Refer to KB34899 - [Junos Space] Database out of sync with error 2026 from "show slave status to resolve this issue.
This is known behavior in some Junos Space versions following system or process start or restart. This will continue to be seen after Junos Space restart until a patch is applied.
For Junos Space versions 19.2 and 19.3, apply the latest platform hot patch, which is available on the Downloads page for each version.
The following is a temporary workaround when "Out of Sync" is seen for one node only. If both nodes show "Out of Sync," perform the Reset process as described in Resetting DB Replication.
From the "Out of Sync" node CLI, execute:
mysql -ujboss -p$(grep mysql.jboss /etc/sysconfig/JunosSpace/pwd | awk -F= '{print $2}') -e "stop slave;start slave;"
This means that the binary log data was deleted prior to data transfer. This will happen if the "Out of Sync" condition has existed for a long time without being corrected, or disk space is low. Use the Resetting DB Replication procedure below to correct this problem.
This error is caused when the other DB is not reachable, usually due to a DB node being down or restarting. It may also be due to network outage. You can resolve this error by restoring network connectivity, or restoring the Space DB node that is down. However, additional DB sync issues may be seen after this.
These errors are caused when the data update being processed from the peer system cannot be completed. This is usually caused by one of the situations below:
-
Outage of one Space node for any length of time (required replication mysql-bin files may be discarded before replication occurred)
-
Low disk space on VIP DB node (may trigger delete of required mysql-bin files before they are consumed by the peer Space DB node)
-
Under a failover condition, if different edits are made to the same data on each node, and the cluster is then re-joined
-
VMWare Snapshots taken at different times and restored
-
(Software bug) Referencing of data in a table that is not replicated (rare)
If VMWare snapshots are taken with Space processes running, they must be at the exact same second in time (may not be possible). If they are seconds or minutes apart, one or more database update commands may be marked as completed on one node and discarded, but not included in the other node snapshot. A problem may not become visible for days or weeks.
Resolve by using the Resetting DB Replication below.
Note: Other Error codes may have the same cause as 1032.
Resetting DB Replication
Note
The Junos Space database configuration is set up as an active-active cluster, wherein replication happens in both directions at all times. At the same time, all Junos Space applications communicate only with the VIP Address to update the database. The VIP node will have the current and best data, unless there was a recent VIP failover. If the VIP did failover recently, it is possible that DB replication was in a bad state prior to the failover, but was not identified by the Space admin. The VIP may need to be failed back to the previous VIP node or data loss may result from reset replication.
To bring the databases in sync between the primary (VIP) node and the rest of the nodes in the fabric of Junos Space, you can use the "Reset Replication" option available on the Network Management Platform > Administration > Fabric page from Junos Space Release 17.2R1 and later. In releases before Junos Space Platform Release 17.2R1, resetting of MySQL replication is done by backing up and restoring the Junos Space Platform database, which involves backing up the MySQL database.
Resetting replication of the MySQL database enables continuous and uninterrupted data replication between the VIP and non-VIP MySQL nodes. This uninterrupted data replication ensures that there is no loss of data or network down time. Data on the non-VIP DB node is purged and restored from the current VIP.
To reset MySQL replication on the Junos Space Network Management Platform, a user must be a Super Administrator or a System Administrator.
To reset replication of the MySQL database, perform the following steps:
-
On the Junos Space Network Management Platform UI, select Administration > Fabric.
The Fabric page is displayed.
-
Click the Reset MySQL Replication button.
The Reset MySQL Replication page is displayed.
-
To reset the database replication, click the Reset MySQL Replication button.
The Reset MySQL Replication dialog box appears, displaying the job ID corresponding to the reset action.
-
Click the job ID to view the details of the job.
You are redirected to the Job Management page with a filtered view of the job corresponding to the reset action.
-
Double-click the row corresponding to the job to view the details of the job. The View Job Details page displays the details of the job.
Click OK to close the page and return to Reset MySQL Replication page.
Failure of the reset job indicates that the database nodes are still not synchronized. You can retry the procedure to reset the replication.
NOTE: Resetting the MySQL replication resets replication only between the database nodes on the active site. If you have configured disaster recovery, DR replication should be reset. To back up and restore MySQL data on the standby site, stop and restart the disaster recovery process on the Junos Space Network Management Platform.
To stop the backup process at the active site, perform the following steps:
-
Log in to the CLI of the Junos Space node at the active site on which the VIP or the eth0:0 interface is configured.
The Junos Space Settings Menu is displayed.
-
Enter 6 (if you are using a hardware appliance) or 7 (if you are using a virtual appliance) at the Junos Space Settings Menu prompt to run shell commands.
You are prompted to enter the administrator password.
-
Enter the administrator password.
-
Type jmp-dr stop
at the shell prompt and press Enter.
The disaster recovery process on both sites is stopped.
-
To restart the disaster recovery process on both sites, type jmp-dr start
and press Enter.
The disaster recovery process is restarted on both sites.
The MySQL nodes at the standby site are backed up and restored.
If the database status is still shown as "Out of Sync" after following the above process, contact Support for further assistance.