Support Support Downloads Knowledge Base Juniper Support Portal Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[Contrail] Schema Crash [ERROR]: Cassandra connection down. Exception in unbound method ColumnFamily.multiget

1

0

Article ID: KB36309 KB Last Updated: 10 Feb 2021Version: 1.0
Summary:
 

This article provides the reason for a schema to crash in all nodes in Contrail and details the steps for resolution.

Symptoms:
 

The Contrail schema is constantly crashing in all nodes with the following error in the contrail-schema logs.

 
11/02/2020 11:10:28 PM [contrail-schema] [WARNING]: RabbitMQ publish connection ESTABLISHED <Connection: amqp://guest:**@X.X.X.X:5672// at 0x7feeaa40f510>
11/02/2020 11:22:43 PM [contrail-schema] [ERROR]: Cassandra connection down. Exception in <unbound method ColumnFamily.multiget>
11/02/2020 11:22:43 PM [contrail-schema] [DEBUG]: Notification Message: {u'fq_name': [u'8a6f83bb-f156-40a4-b58f-3b6cd231543c'],
u'obj_dict': {u'display_name': u'8a6f83bb-f156-40a4-b58f-3b6cd231543c',
               u'fq_name': [u'8a6f83bb-f156-40a4-b58f-3b6cd231543c'],
               u'id_perms': {u'created': u'2020-11-02T23:10:40.237418',
                             u'creator': None,
                             u'description': None,
                             u'enable': True,
                             u'last_modified': u'2020-11-02T23:10:40.237418',
                             u'permissions': {u'group': u'cloud-admin-group',
                                              u'group_access': 7,
                                              u'other_access': 7,
                                              u'owner': u'cloud-admin',
                                              u'owner_access': 7},
                             u'user_visible': True,
                             u'uuid': {u'uuid_lslong': 9912825433908596324L,
                                       u'uuid_mslong': 14189465487881094428L}},
               u'perms2': {u'global_access': 0,
                           u'owner': u'cloud-admin',
                           u'owner_access': 7,
                           u'share': []},
               u'uuid': u'c4eb1bae-a5d1-451c-8991-6e3634944e64'},
u'oper': u'CREATE',
u'request-id': u'req-e1aafb9c-0bb3-44e0-8e80-fb2347942253',
u'type': u'virtual_machine',
u'uuid': u'c4eb1bae-a5d1-451c-8991-6e3634944e64'}
11/02/2020 11:22:43 PM [contrail-schema] [WARNING]: RabbitMQ drainer connection down

 

Cause:

Even when Cassandra is up, Cassandra queries can time out if the timeout exception is raised more than max_retries. The MaximumRetryException exception is raised when this happens followed by the reporting of the "ERROR]: Cassandra connection down. Exception in <unbound method ColumnFamily.multiget>" log. Check the code at ‚Äčhttps://github.com/Juniper/contrail-controller/blob/R1909.30/src/config/common/cfgm_common/vnc_cassandra.py#L524.

To determine the cause, perform the following:

  1. Check the Contrail schema introspect for the process state of database Cassandra and Zookeeper by using the following URL:

http://<active schema node>:8087/Snh_SandeshUVECacheReq?x=NodeStatus

Look for any process that is listed as Non-Functional.

Note: The Contrail API queries through all the records, including tombstones (deleted records), before retrieving the list of VNs. A high number of tombstone cells can delay the list operation.

  1. In this case, the Cassandra errorlog (system.log) indicates a read query for 176915 tombstone cells due to which the API was timing out.

 

Solution:
 

In order to resolve this issue, purge all tombstone records by setting gc_grace_second as 0. (Note: For detailed instructions, refer to KB35700 - [Contrail] Database: Cassandra connection down, IFMap Server End-Of-RIB not computed, No BGP configuration for self.)

For Cassandra config database:

  1. Enter into the container (config_database_cassandra_1).

Note: The container name may vary based on the different environments used for installation.

# docker exec -it config_database_cassandra_1 /bin/bash
  1. Connect to the Contrail config database.

# cqlsh <controller ip> 9041
Connected to contrail_database at X.X.X.X:9041.
  1. Check the current values:

cqlsh> SELECT table_name,gc_grace_seconds FROM system_schema.tables WHERE keyspace_name='config_db_uuid';

table_name        | gc_grace_seconds
obj_fq_name_table | 864000
obj_shared_table  | 864000
obj_uuid_table    | 864000
  1. Change the desired values to 0 seconds (for immediate purge):

cqlsh> ALTER TABLE config_db_uuid.obj_fq_name_table WITH gc_grace_seconds = 0;
cqlsh> ALTER TABLE config_db_uuid.obj_shared_table WITH gc_grace_seconds = 0;
cqlsh> ALTER TABLE config_db_uuid.obj_uuid_table WITH gc_grace_seconds = 0;
  1. Watch the Cassandra errorlogs (find / -name system.log) and wait for deletion of all tombstone records.

  2. Refer to KB35700 - [Contrail] Database: Cassandra connection down, IFMap Server End-Of-RIB not computed, No BGP configuration for self and set gc_grace_second as per environment needs.

  3. Restart the contrail-schema container in all the nodes and watch the contrail-schema.log for the re-initialization process.

 

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search