Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[Contrail] How to debug Znodes inconsistency and re-sync

0

0

Article ID: KB34067 KB Last Updated: 06 Apr 2019Version: 1.0
Summary:

Zookeeper cluster may report different node counts of znodes in case of network outage, disk full, or other reasons. The db_manage script can detect and report such errors.

This article explains how to compare the zookeeper database to see the inconsistencies and how to make a follower to be consistent with a leader after such failures occur. Contrail version being discussed here is 3.2.x/3.0.x.

Symptoms:

During a typical contrail DB clean up process, a few sanity checks are performed, one of which is to check the node counts in a zookeeper cluster. In the following example, the sanity check reports an issue:

2019-03-12 16:44:03,726 ERROR: (v1.12) Checker check_zk_mode_and_node_count: Failed:

Error, Differing node counts [2457, 2524, 2457].

If we turn on --debug flag for check_zk_mode_and_node_count, we will see more details as follows. One of the two followers (highlighted in red) is out of sync with the other two zookeeper nodes(highlighted in blue)

python /usr/lib/python2.7/dist-packages/vnc_cfg_api_server/db_manage.py --debug check_zk_mode_and_node_count
2019-03-13 17:43:14,580 DEBUG: Issuing 'stat' on 172.30.1.17:2181:
2019-03-13 17:43:14,583 DEBUG: Got: Zookeeper version: 3.4.5--1, built on 06/10/2013 17:26 GMT

Clients:

/172.30.1.23:34383[1](queued=0,recved=1061,sent=1061)
/172.30.1.16:49918[1](queued=0,recved=13142,sent=13142)
/172.30.1.16:49909[1](queued=0,recved=13110,sent=13110)
/172.30.1.39:55153[1](queued=0,recved=13156,sent=13156)
/172.30.1.17:56331[0](queued=0,recved=1,sent=0)
/172.30.1.39:55141[1](queued=0,recved=13073,sent=13073)
/172.30.1.23:34423[1](queued=0,recved=1036,sent=1036)
/172.30.1.39:55144[1](queued=0,recved=13162,sent=13163)
/172.30.1.17:56330[1](queued=0,recved=1,sent=1)
/172.30.1.23:34424[1](queued=0,recved=1038,sent=1039)
/172.30.1.38:58501[1](queued=0,recved=13057,sent=13057)
/172.30.1.42:40532[1](queued=0,recved=1037,sent=1037)
/172.30.1.17:56327[1](queued=0,recved=1,sent=1)
/172.30.1.17:46770[1](queued=0,recved=1046,sent=1046)
/172.30.1.38:58505[1](queued=0,recved=13095,sent=13095)

Latency min/avg/max: 0/0/39
Received: 97138
Sent: 97140
Connections: 15
Outstanding: 0
Zxid: 0x1ee0000055f
Mode: follower
Node count: 2457

2019-03-13 17:43:14,592 DEBUG: Issuing 'stat' on 172.30.1.23:2181:
2019-03-13 17:43:14,594 DEBUG: Got: Zookeeper version: 3.4.5--1, built on 06/10/2013 17:26 GMT
Clients:
/172.30.1.17:54736[1](queued=0,recved=1,sent=1)
/172.30.1.42:35329[1](queued=0,recved=1036,sent=1036)
/172.30.1.17:54737[0](queued=0,recved=1,sent=0)
/172.30.1.38:35480[1](queued=0,recved=13136,sent=13136)
/172.30.1.17:45222[1](queued=0,recved=1036,sent=1036)

Latency min/avg/max: 0/2/7237
Received: 15523
Sent: 15515
Connections: 5
Outstanding: 0
Zxid: 0x1ee00000561
Mode: leader
Node count: 2457

2019-03-13 17:43:14,602 DEBUG: Issuing 'stat' on172.30.1.42:2181: 
2019-03-13 17:43:14,604 DEBUG: Got: Zookeeper version: 3.4.5--1, built on 06/10/2013 17:26 GMT
Clients:
/172.30.1.17:53796[0](queued=0,recved=1,sent=0)
/172.30.1.17:53795[1](queued=0,recved=1,sent=1)
/172.30.1.24:36664[1](queued=0,recved=20442,sent=20442)
/172.30.1.16:56135[1](queued=0,recved=12976,sent=12976)
/172.30.1.23:40583[1](queued=0,recved=1035,sent=1035)
/172.30.1.42:40078[1](queued=0,recved=1069,sent=1069)
/172.30.1.17:44281[1](queued=0,recved=1036,sent=1036)
/172.30.1.42:40074[1](queued=0,recved=1033,sent=1033)
/172.30.1.17:44232[1](queued=0,recved=1033,sent=1033)
/172.30.1.41:53997[1](queued=0,recved=20441,sent=20441)
/172.30.1.40:59413[1](queued=0,recved=20450,sent=20450)

Latency min/avg/max: 0/0/8719
Received: 83406
Sent: 83405
Connections: 11
Outstanding: 0
Zxid: 0x1ee00000563
Mode: follower
Node count: 2524
Cause:

Various reasons could get zookeeper into this state, such as network failure, disk full, etc. A typical question to ask is how to find out which items contribute to the extra znodes in the follower's database?

Solution:

You can get a DB snapshot in json format by following the steps described in Juniper Contrail Feature Guide. Then use any ASCII file compare tool to compare the difference of contents the in zookeeper DB taken from a good node versus a bad node. In the aforementioned case, the extra nodes are all analytics locks, which might indicate a connectivity issue between controllers and analytics nodes.

+ /contrail_cs/contrail-topology/locks/46/1c88c72133034ce3826de4083a4e9702_lock_0000001340/
+ /contrail_cs/contrail-topology/locks/19/935fd7bf953049ac97d179d407024a06_lock_0000001804/
+ /contrail_cs/contrail-topology/locks/38/1b5a6e4bf7974489aa4635c4dcb1575e_lock_0000001341/
+ /contrail_cs/contrail-snmp-collector/locks/6/d71a9c793dc7466f835b26ea90c8c2ef_lock_0000013459/
+ /contrail_cs/contrail-topology/locks/23/98f339805b39457aaca3874a887917e9_lock_0000002107/
+ /contrail_cs/contrail-snmp-collector/locks/27/40d8bc50602141288957fada3e2440bd_lock_0000001976/
+ /contrail_cs/contrail-snmp-collector/locks/17/0abee090d6dd4bec8c2ff233f4526a7c_lock_0000002807/
+ /contrail_cs/contrail-topology/locks/26/1827cef04019498ba04d156357f8953b_lock_0000002063/
+ /contrail_cs/contrail-snmp-collector/locks/10/2db39f3b290e4a328f6006e3d610dc4e_lock_0000006999/
+ /contrail_cs/contrail-topology/locks/34/b29a568969414490aba8446501998962_lock_0000001093/
+ /contrail_cs/contrail-snmp-collector/locks/3/769df95861ef4d39bcd1875314530db2_lock_0000013352/
+ /contrail_cs/contrail-topology/locks/24/b62e5ed8b8954fc6b96b7e43f5021899_lock_0000001919/
+ /contrail_cs/contrail-topology/locks/8/7d8e85053a8546e496cd0d18051fc38d_lock_0000004845/
+ /contrail_cs/contrail-snmp-collector/locks/13/d3ddbf845b2f4418928e2bcbe2d11c93_lock_0000008114/
+ /contrail_cs/contrail-topology/party/0a5934039d3f428f96e4a9a727ec082b-messaging-mtunjvmcnal03.mdt21b.aic.cip.att.com-5576/
+ /contrail_cs/contrail-snmp-collector/locks/26/9a8014b4d8684db7b91d8652dbfb5f4a_lock_0000002061/
+ /contrail_cs/contrail-topology/locks/44/112a90198fae4b3ab847e061c6188514_lock_0000001069/
+ /contrail_cs/contrail-topology/locks/28/eb07699a758a47a4ae35525bab2e3f23_lock_0000001633/
+ /contrail_cs/contrail-snmp-collector/locks/35/83c9d8d4970c435d8a2a49a52505f6c4_lock_0000001427/
+ /contrail_cs/contrail-snmp-collector/locks/46/cc65791eefd44df3bec0f56975df928e_lock_0000001373/
+ /contrail_cs/contrail-snmp-collector/locks/7/3f4848826a9447bba13be332929b15c7_lock_0000010687/
+ /contrail_cs/contrail-topology/locks/15/79018a07c56b4760802dedff4ef148a9_lock_0000002049/
+ /contrail_cs/contrail-topology/party/ff5d7a03d5934714adea3b49968a96ca-mtunjvmcnal01.mdt21b.aic.cip.att.com-21719/
+ /contrail_cs/contrail-snmp-collector/locks/44/533425a2cd9844b1b02946df15b8741a_lock_0000001266/
+ /contrail_cs/contrail-topology/locks/25/4139ab330a3e4283a6735e9ad3eec8d0_lock_0000001642/
+ /contrail_cs/contrail-topology/locks/21/0e6658da9db1492f90798c2e96b0d61a_lock_0000003542/
+ /contrail_cs/contrail-snmp-collector/locks/43/6a4fa27f31f8444d917dcb8361bd2bab_lock_0000002023/
+ /contrail_cs/contrail-snmp-collector/locks/19/f5287d4ce1ba41d7a56393fa0e02def6_lock_0000003093/
+ /contrail_cs/contrail-topology/locks/35/2c16fb3b79454e7b9a26d68f1d308c02_lock_0000001442/
+ /contrail_cs/contrail-topology/locks/5/263277c91bbb47e38b8dd798c8275118_lock_0000005026/
+ /contrail_cs/contrail-topology/locks/6/70188f2bcc2f412b9485d0398247a2a0_lock_0000008920/
+ /contrail_cs/contrail-topology/locks/32/281c0b93a00146a4bf0339e1018bee28_lock_0000001404/
+ /contrail_cs/contrail-snmp-collector/locks/36/0130b00f28544252af0cd7a26d5eff75_lock_0000001872/
+ /contrail_cs/contrail-snmp-collector/locks/24/90ded1a885d84748a1470fe856f75743_lock_0000003157/
+ /contrail_cs/contrail-snmp-collector/locks/22/610803b091894209b38e905f91596086_lock_0000003017/
+ /contrail_cs/contrail-snmp-collector/locks/12/137f077a3e414545bb50eeb6b51d154f_lock_0000006231/
+ /contrail_cs/contrail-topology/locks/37/4843cb27e2d2472b91877f6864178323_lock_0000001033/
+ /contrail_cs/contrail-snmp-collector/party/9079164476e44d53a202556b91e7052c-messaging-mtunjvmcnal02.mdt21b.aic.cip.att.com-31077/
+ /contrail_cs/contrail-snmp-collector/locks/23/ecd3cee3bea84ddfabd203f5995d4f00_lock_0000003145/
+ /contrail_cs/contrail-snmp-collector/locks/11/641fde79e7404125ab6f4ce02764b2a9_lock_0000007592/
+ /contrail_cs/contrail-topology/locks/39/f17727061b9e40d38a290f56fbb9da00_lock_0000001303/
+ /contrail_cs/contrail-topology/locks/29/764b0c8bea054072b41850396ad3f757_lock_0000001521/
+ /contrail_cs/contrail-topology/locks/27/b6e6f08ab4484ac8b93af77c4a95d586_lock_0000001930/
+ /contrail_cs/contrail-topology/locks/45/5cdbebd44fa845a0842ec38bb32638c9_lock_0000001592/
+ /contrail_cs/contrail-snmp-collector/locks/41/59e8c17788f749f0b6fd51266f15dcee_lock_0000001581/
+ /contrail_cs/contrail-snmp-collector/locks/8/d7d31525d5484edeabe6e690fea5b32d_lock_0000007616/
+ /contrail_cs/contrail-snmp-collector/locks/34/d00f2cef96024a9ea4cbef57aa8b5cd2_lock_0000002097/
+ /contrail_cs/contrail-topology/locks/41/6c2b0acb90cb414e8c4e1591eb43caa3_lock_0000001265/
+ /contrail_cs/contrail-topology/locks/10/b0e43c1b8c964e91b71e4ec943d676b3_lock_0000003954/
+ /contrail_cs/contrail-topology/locks/36/5f31e3403c374cfe98528bbd78a8f221_lock_0000001308/
+ /contrail_cs/contrail-topology/locks/40/cb821f65bb9e40c1a2613aef8ea827f3_lock_0000001471/
+ /contrail_cs/contrail-snmp-collector/locks/29/6b21e675968346099b4d95c83a24cf1b_lock_0000001728/
+ /contrail_cs/contrail-topology/locks/33/2f2335e16a6440e988f7e8a3a1cd2186_lock_0000001365/
+ /contrail_cs/contrail-topology/locks/1/a01f0836c1394d06a52d06f9ef67a648_lock_0000014554/
+ /contrail_cs/contrail-snmp-collector/locks/45/407c55ba583a4428b44fb2ddd4b7210e_lock_0000001607/
+ /contrail_cs/contrail-snmp-collector/locks/30/a2b5aa73d2d64d66ac1e5cbec3d99d46_lock_0000001986/
+ /contrail_cs/contrail-snmp-collector/locks/4/1185b6d073d84d60b94f2f5f1dc120e9_lock_0000011491/
+ /contrail_cs/contrail-snmp-collector/locks/33/a6ed87990dbf42d7a47cd6c477bab471_lock_0000002347/
+ /contrail_cs/contrail-snmp-collector/locks/2/58d194dbf0ca4dd29c8c4d92b5594a4f_lock_0000015103/
+ /contrail_cs/contrail-topology/locks/3/0ef9b4bedde9463aa8557216ac8c365d_lock_0000009532/
+ /contrail_cs/contrail-topology/locks/13/ed9253679f914eaf8bac5a98c0317b71_lock_0000003649/
+ /contrail_cs/contrail-snmp-collector/locks/14/459554cbee424274815332d55adb8d85_lock_0000005235/
+ /contrail_cs/contrail-snmp-collector/locks/31/1e4ac957fb26431686c5571e134509c6_lock_0000001802/
+ /contrail_cs/contrail-snmp-collector/party/2188497b431a4a36af7aabb2b04ef316-mtunjvmcnal01.mdt21b.aic.cip.att.com-28380/
+ /contrail_cs/contrail-topology/locks/4/da1de78caa3e41bc8e66f704c8d65306_lock_0000009333/
+ /contrail_cs/contrail-topology/locks/18/42c3f6c2e6b7448d9a2aab30ae7d1252_lock_0000002338/
+ /contrail_cs/contrail-topology/locks/17/3ebda6ef82b64f809a76a881edd63f53_lock_0000002830/

​A possible way to recover it and make the bad node resync with the leader is to take the following steps on the follower node:

  1. service zookeeper stop

  2. mv /var/lib/zookeeper/version-2 /var/lib/zookeeper/version-2-backup

  3. ls -F /var/lib/zookeeper/

  4. service zookeeper start

  5. echo stat | nc 127.0.0.1 2181​

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search