Support Support Downloads Knowledge Base Juniper Support Portal Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[Contrail] Post upgrade to 1909 config_schema_1 container crash due to zookeeper key/path/index conflict

0

0

Article ID: KB35197 KB Last Updated: 25 Oct 2019Version: 1.0
Summary:
This KB explains specific issue seen during ISSU upgrade from release 3.2.X to 1909 with respect to schema conversion.
Symptoms:

/var/log/contrail/schema-zk.log logs indicate below errors:

10/21/2019 12:31:14 PM [schema]: Sending request(xid=1823): GetData(path='/id/bgp/route-targets/type0/0008000000', watcher=None)
10/21/2019 12:31:14 PM [schema]: Received response(xid=1823): ('default-domain:default-project:ip-fabric:ip-fabric', ZnodeStat(czxid=12884951545, mzxid=12884951545, ctime=1571589897617, mtime=1571589897617, version=0, cversion=0, aversion=0, ephemeralOwner=0, dataLength
=50, numChildren=0, pzxid=12884951545))
*******10/21/2019 12:31:14 PM [schema]: For index 8000000 reserve conflicts with existing value default-domain:default-project:ip-fabric:ip-fabric.********
10/21/2019 12:31:14 PM [schema]: Sending request(xid=1824): Close()
Cause:
Contrail 3.x version system defined route targets are saved at /id/bgp/route-targets/0008000000
Contrail 1909 and later version as part of story (4 byte ASN support) has  /id/bgp/route-targets/type0/000800000 and /id/bgp/route-targets/type1_2/0008000000
8 = 2 (reserve) + 2 (ASN) + 4 (RT) -> type 0
8 = 2 (reserve) + 4 (ASN) + 2 (RT) -> type 1_2

As part of upgrade, all entries at /id/bgp/route-targets/0008000000  are converted into  /id/bgp/route-targets/type0/0008000000 or /id/bgp/route-targets/type1_2/0008000000
The cause of this issue is due to api service tries to assign previously used route target entries to VM when schema conversion is still ongoing hence causing conflict.
Solution:
Below are two Proactive approaches to avoid this issue, if you are facing this issue, please contact JTAC

Approach 1- Use the probe logic (for all three nodes in case of multi node setup) contrail schema, contrail service monitor and contrail device manager
(Applicable to all ISSU upgrades in the future, from R1909 to R190x) 

     After "docker compose UP” - It is recommended to implement a probe logic based on contrail-status to ensure the services stays ”Active” for a 3 x * SCHEMA REINIT COMPLETE TIME

     **SCHEMA REINIT COMPLETE TIME: Run manually on the cluster and calculate the time 
    - Perform an ISSU upgrade with cluster config, Observe how long schema transformer container takes to complete its INIT process after “docker-compose UP” step.
    - INIT process is complete when no entires in old zookeeper path /id/bgp/route-targets except type0 logs settle down in /var/log/contrail/schema-zk.log
    - This is a one-time operation to find out time taken to complete REINIT and then can be used as reference for probe calculation below.
 
    Example:
      If the SCHEMA REINIT COMPLETE TIME is 300 seconds, MAX probe time is 900 sec. For the first 300 secs, if schema status stays active then move to next step (#2)
      If active schema transformer become backup, then check for another 300 seconds (300-600sec) on active schema transformer and so on
      After 900sec, then stop ISSU process – Contact JTAC
      
Approach 2 - Execute below procedure in any one of the zookeeper node.
(specific to current 3.2.x to R1909 upgrade)

         Log in to zk database
         docker exec -it config_database_zookeeper_1 bash
         Login to zk client
         /zookeeper-3.4.10/bin/zkCli.sh -server <IP address of the node>:2181
         check below zookeeper node by running “ls” command. 
         ls /id/bgp/route-targets
         [type0]

If there are more entries other than “type0”, sleep for a few seconds and retry for a max time until all zookeeper entries are cleared and only “type0” exists. max time can be found by manually trying it once or setup a high time, say 600 seconds. if zookeeper entires are not cleared by this time, Contact JTAC
Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search