Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[Contrail] How to check watches in Zookeeper

0

0

Article ID: KB35223 KB Last Updated: 16 Nov 2019Version: 1.0
Summary:

During troubleshooting, when an active Contrail service such as svc-monitor, schema, or device-manager is down and the backup service is not detected, this indicates a possible failure has occurred. In this case, you will need to verify watches set by the backup Contrail services on Zookeeper nodes.

Zookeeper use watches to notify a client on znode changes. This article explains how to check watches set by ZooKeeper servers and how it is used.

Solution:

Zookeeper uses the 'wchc' command to list all watches set on the Zookeeper server.

Example:

# echo wchc | nc localhost 2181
<snipped>
0x16df603bab90002
        /svc-monitor/ead8d60a310c448ab530809bc15bc354__lock__0000000559
<snipped>
0x16df603bab90005
        /svc-monitor/140cb1cea3d34227a34194eb4102e5b9__lock__0000000572
<snipped>

Zookeeper will watch znode

'/svc-monitor/ead8d60a310c448ab530809bc15bc354__lock__0000000559' for session '0x16df603bab90002',
and
'/svc-monitor/140cb1cea3d34227a34194eb4102e5b9__lock__0000000572' for session '0x16df603bab90005'.

The reason these watches are set is because the above-mentioned session is waiting to acquire the lock to be active svc-monitor. These sessions won't pull data from zookeeper periodically to see if they can get the lock. Instead, they wait for zookeeper to notify them once the target znode is removed or changed. With a further look inside zookeeper, we can see both sessions 0x16df603bab90002 and 0x16df603bab90005 are in backup status, so they need to watch another session which has higher priority acquiring the lock. 

On controller1, use 'contrail-status -d | grep svc' command to get svc-monitor status.

contrail-svc-monitor    backup    pid 10632, uptime 6 days, 20:43:49      

On controller2, use 'contrail-status -d | grep svc' command to get svc-monitor status.

contrail-svc-monitor    backup    pid 29731, uptime 6 days, 20:46:00  

On controller3, use 'contrail-status -d | grep svc' command to get svc-monitor status.

contrail-svc-monitor    active    pid 22166, uptime 24 days, 2:38:44   

Then use the '/usr/share/zookeeper/bin/zkCli.sh' command to enter zookeeper CLI to check all children znodes under /svc-monitor. 

[zk: localhost:2181(CONNECTED) 2] get /svc-monitor/f222871e20d047cfade1003f9f485740__lock__0000000573
10632
cZxid = 0x700000000c
ctime = Tue Oct 22 17:28:23 PDT 2019
mZxid = 0x700000000c
mtime = Tue Oct 22 17:28:23 PDT 2019
pZxid = 0x700000000c
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x16df603bab90005
dataLength = 5
numChildren = 0

[zk: localhost:2181(CONNECTED) 1] get /svc-monitor/140cb1cea3d34227a34194eb4102e5b9__lock__0000000572
29731
cZxid = 0x7000000006
ctime = Tue Oct 22 17:28:22 PDT 2019
mZxid = 0x7000000006
mtime = Tue Oct 22 17:28:22 PDT 2019
pZxid = 0x7000000006
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x16df603bab90002
dataLength = 5
numChildren = 0

[zk: localhost:2181(CONNECTED) 0] get /svc-monitor/ead8d60a310c448ab530809bc15bc354__lock__0000000559
22166
cZxid = 0x6c000001c9
ctime = Sat Oct 05 11:27:33 PDT 2019
mZxid = 0x6c000001c9
mtime = Sat Oct 05 11:27:33 PDT 2019
pZxid = 0x6c000001c9
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x26d9d24aec60008
dataLength = 5
numChildren = 0

The output above shows the contrail-status and the znode related to each svc-monitor instance. For each svc-monitor instance, you can link znode with contrail-status output using znode data, which is the same as pid of svc-monitor instance, to decide which znode represents active or backup svc-monitor. Also, for each znode, "ephemeralOwner" is the session ID of svc-monitor which created the znode. For example, on controller2, svc-monitor with pid  29731 is the backup, and it established session 0x16df603bab90002 with zookeeper, then created znode /svc-monitor/140cb1cea3d34227a34194eb4102e5b9__lock__0000000572. Under znode /svc-monitor, the next child that has higher priority is /svc-monitor/ead8d60a310c448ab530809bc15bc354__lock__0000000559. Therefore, it will set a watch with mapping of its session ID and target znode to monitor the one single znode with higher priority next itself. 

The svc-monitor on controller1 sets a watch to monitor svc-monitor on controller2, and svc-monitor on controller 2 is watching svc-monitor on contoller3, which is in active status. If you are seeing any watches missing from "echo wchc | nc localhost 2181" output for these /svc-monitor children znodes, you may experience some issues during svc-monitor failover. 

For more on troubleshooting Zookeeper, refer to KB31144 - Contrail Getting Started - Administration, Configuration & Troubleshooting (JumpStation)

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search