Support Support Downloads Knowledge Base Apex Support Portal Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[Junos Space Platform] Database status showing "Out of Sync" on Fabric page

2

0

Article ID: KB33909 KB Last Updated: 16 Aug 2021Version: 3.0
Summary:

Users who are managing a Junos Space Fabric consisting of two or more Space nodes or DB nodes may observe that the database status under Network Management Platform > Administration > Fabric is showing as being "Out of Sync."

This article lists some common errors associated with this status message and provides solutions.

Symptoms:

Junos Space Fabric UI shows "Out of Sync" message and jmp-dr health reports a MySQL replication error.

Cause:

Some possible causes are given below. To determine the cause, perform the following:

For a system that is showing "Out of Sync"

From Junos Space CLI, use the following command to determine the replication error (this command will also work for DR site DB nodes):

mysql -ujboss -p$(grep mysql.jboss /etc/sysconfig/JunosSpace/pwd | awk -F= '{print $2}') -e "show slave status\G show master status\G"

The key fields to look at are:

Last_IO_Errno:
Last_IO_Error:

Last_SQL_Errno:
Last_SQL_Error:

When there is no error, the database is considered to be in-sync. There is no bit-by-bit data comparison of sync status. The only information you get is that all replication has succeeded.

For a system no longer showing "Out of Sync"

Review /var/chroot/mysql/var/log/mysqld.log from both DB nodes.

Disaster Recovery (DR) Deployments

Keep in mind the data replication directions with DR deployments when reviewing errors. The Replication repair method used needs to consider the replication peer.

Replication Directions configurations when DR is running:

  • The active DR site mysql VIP replicates data from the active DR site mysql non-VIP.

  • The active DR site mysql non-VIP replicates data from the active DR site mysql VIP.

  • The standby DR site mysql VIP node replicates data from the active DR site mysql VIP.

  • The standby DR site mysql non-VIP replicates from the standby DR site mysql VIP.

When DR is stopped or reset, database replication between the DR sites is disabled.

Use the command "show slave status" and observe "Master_Host" to find the replication peer.

Solution:

Solutions can vary depending on the error found. The "Reset MySQL Replication" button will almost always resolve the replication errors.

If the error is seen only on the standby DR site "show slave status" output, use the steps detailed in Disaster RecoveryStandby site reset instead of Resetting DB Replication.

Some common errors and solutions

  • Last_IO_Errno: 2026

This error will be seen only on an old Space installation. It means that the certificate has expired. Refer to KB34899 - [Junos Space] Database out of sync with error 2026 from "show slave status to resolve this issue.

  • Last_IO_Errno: 2005

This is known behavior in some Junos Space versions following system or process start or restart. This will continue to be seen after Junos Space restart until a patch is applied.

For Junos Space versions 19.2 and 19.3, apply the latest platform hot patch, which is available on the Downloads page for each version.

The following is a temporary workaround when "Out of Sync" is seen for one node only. If both nodes show "Out of Sync," perform the Reset process as described in Resetting DB Replication.

From the "Out of Sync" node CLI, execute:

mysql -ujboss -p$(grep mysql.jboss /etc/sysconfig/JunosSpace/pwd  | awk -F= '{print $2}')  -e "stop slave;start slave;"
  • Last_IO_Errno: 1236
    Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'
    

This means that the binary log data was deleted prior to data transfer. This will happen if the "Out of Sync" condition has existed for a long time without being corrected, or disk space is low. Use the Resetting DB Replication procedure below to correct this problem.

  • Last_IO_Errno: 1045
    Last_IO_Error: error connecting to master 'repUser@<hostname>:3306' - retry-time: 10  retries: 7

This error is caused when the other DB is not reachable, usually due to a DB node being down or restarting. It may also be due to network outage. You can resolve this error by restoring network connectivity, or restoring the Space DB node that is down. However, additional DB sync issues may be seen after this.

  • Last_SQL_Errno: 1032
    Last_SQL_Error: Could not execute Update_rows event on table ...
    
    Last_SQL_Errno: 1062
    Last_SQL_Error: Could not execute Write_rows event on table ...
    
    Last_Errno: 1451
    Last_Error: Could not execute Delete_rows event on table ...

These errors are caused when the data update being processed from the peer system cannot be completed. This is usually caused by one of the situations below:

  1. Outage of one Space node for any length of time (required replication mysql-bin files may be discarded before replication occurred)

  2. Low disk space on VIP DB node (may trigger delete of required mysql-bin files before they are consumed by the peer Space DB node)

  3. Under a failover condition, if different edits are made to the same data on each node, and the cluster is then re-joined

  4. VMWare Snapshots taken at different times and restored

  5. (Software bug) Referencing of data in a table that is not replicated (rare)

If VMware snapshots are taken with Space processes running, they must be at the exact same second in time (may not be possible). If they are seconds or minutes apart, one or more database update commands may be marked as completed on one node and discarded, but not included in the other node snapshot. A problem may not become visible for days or weeks.

Resolve by using the steps in Resetting DB Replication below.

Note: Other Error codes may have the same cause as 1032.

 

Resetting DB Replication

Note: The Junos Space database configuration is set up as an active-active cluster (if DR, only at the active site), wherein replication happens in both directions at all times. At the same time, all Junos Space applications communicate only with the VIP Address to update the database. The VIP node will have the current and best data, unless there was a recent VIP failover. If the VIP did fail over recently, it is possible that DB replication was in a bad state prior to the failover, but was not identified by the Space admin. The VIP may need to be failed back to the previous VIP node or data loss may result from reset replication.

To bring the databases in sync between the primary (VIP) node and the rest of the nodes in the fabric of Junos Space, you can use the "Reset Replication" option that is available on the Network Management Platform > Administration > Fabric page from Junos Space Release 17.2R1 and later. In releases before Junos Space Platform Release 17.2R1, resetting of MySQL replication is done by backing up and restoring the Junos Space Platform database.

Resetting replication of the MySQL database enables continuous and uninterrupted data replication between the VIP and non-VIP MySQL nodes. This uninterrupted data replication ensures that there is no loss of data or network down time. Data on the non-VIP DB node is purged and restored from the current VIP. 

To reset MySQL replication on the Junos Space Network Management Platform, a user must be a Super Administrator or a System Administrator.

To reset replication of the MySQL database, perform the following steps:

Note: The following process applies to 2+ node Space setups, and if you have DR, only the active site.

  1. On the Junos Space Network Management Platform UI, select Administration > Fabric.

The Fabric page is displayed.

  1. Click the Reset MySQL Replication button.

The Reset MySQL Replication page is displayed.

  1. To reset the database replication, click the Reset MySQL Replication button.

The Reset MySQL Replication dialog box appears, displaying the job ID corresponding to the reset action.

  1. Click the job ID to view the details of the job.

You are redirected to the Job Management page with a filtered view of the job corresponding to the reset action.

  1. Double-click the row corresponding to the job to view the details of the job. The View Job Details page displays the details of the job.

Click OK to close the page and return to Reset MySQL Replication page.

Failure of the reset job indicates that the database nodes are still not synchronized. You can retry the procedure to reset the replication. 

NOTE: Resetting the MySQL replication resets replication only between the database nodes on the active DR site. If you have configured disaster recovery, DR replication should be reset. To back up and restore MySQL data on the standby site, stop and restart the disaster recovery process on the Junos Space Network Management Platform as noted below.

Disaster Recovery, Standby site reset

To stop the backup process at the active site, perform the following steps:

  1. Log in to the CLI of the Junos Space node at the active site on which the VIP or the eth0:0 interface is configured.

The Junos Space Settings Menu is displayed.

  1. Enter 6 (if you are using a hardware appliance) or 7 (if you are using a virtual appliance) at the Junos Space Settings Menu prompt to run shell commands.

You are prompted to enter the administrator password.

  1. Enter the administrator password.

  2. Type jmp-dr stop at the shell prompt and press Enter.

The disaster recovery process on both sites is stopped.

  1. To restart the disaster recovery process on both sites, type jmp-dr start and press Enter.

The disaster recovery process is restarted on both sites.

The MySQL data at the primary site is backed up, then restored on the standby site.

If the database status is still shown as "Out of Sync" after following the above process, contact Support for further assistance.

Modification History:

2021-08-16: DR site clarifications added

 

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search