Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[Contrail] Database: Cassandra connection down, IFMap Server End-Of-RIB not computed, No BGP configuration for self

0

0

Article ID: KB35700 KB Last Updated: 14 May 2020Version: 1.0
Summary:

This article discusses a scenario where the control process is stuck in initializing state due to exceeding tombstone_failure_threshold in Cassandra Database and its possible remediation.

Symptoms:

Contrail-status output:

Contrail control

control: initializing (Database:Cassandra connection down, IFMap Server End-Of-RIB not computed, No BGP configuration for self)
nodemgr: active
named: active
dns: active
Cassandra: Inactive


Cassandra error logs (system.log and debug.log):

ERROR [SharedPool-Worker-1] 2019-10-19 23:00:11,295 MessageDeliveryTask.java:77 - Scanned over 100001 tombstones in config_db_uuid.obj_fq_name_table; 100000 columns were requested; query aborted (see tombstone_failure_threshold;) 
Cause:

This issue occurs if Tombstone_failure_threshold has reached its default value of 100,000. If the number of tombstones scanned by a query exceeds this number, Cassandra will abort the query. This is a mechanism to prevent one or more nodes from running out of memory and crashing.

Solution:

The recommendation is to reduce the default value (10 days) for gc_grace_seconds to 3-5 days (depending on the information below).

The gc_grace_period controls how often major compaction is performed, which clears tombstone entries. If it is set to 3-5 days, the tombstone entries will be cleared fast (ie 10/3 , 3 or 2 times faster).
Things to note in this approach is that if a DB is down, it must bring back the node within the gc_grace_period to avoid the possibility of reseeding the affected node (ie., clear the data from affected node, sync data from other nodes and not use the data from affected node).

Nodetool compact can be performed followed by changing the gc_grace_seconds value during a maintenance window:

docker exec -it  config_database_cassandra_1 /bin/bash
nodetool -p 7200 status
nodetool -p 7200 compact
nodetool -p 7200 status - Ensure everything loos correct

For Cassandra config database: 

  1. Enter into container (config_database_cassandra_1)

    Note: Container name may vary based on different environments used for installation.

    # docker exec -it config_database_cassandra_1 /bin/bash
  2. Connect to contrail config database # cqlsh <controller ip> 9041

    Connected to contrail_database at 10.85.216.9:9041.

  3. See the current values:

    cqlsh> SELECT table_name,gc_grace_seconds FROM system_schema.tables WHERE keyspace_name='config_db_uuid';
    
    table_name        | gc_grace_seconds
    obj_fq_name_table | 864000
    obj_shared_table  | 864000
    obj_uuid_table    | 864000
  4. Change the desired values to 259200 seconds (3 days):

    Note: This examples shows value for 3 days, it can be set between 3-5 days.

    cqlsh> ALTER TABLE config_db_uuid.obj_fq_name_table WITH gc_grace_seconds = 259200;
    cqlsh> ALTER TABLE config_db_uuid.obj_shared_table WITH gc_grace_seconds = 259200;
    cqlsh> ALTER TABLE config_db_uuid.obj_uuid_table WITH gc_grace_seconds = 259200;

Verification :

cqlsh> SELECT table_name,gc_grace_seconds FROM system_schema.tables WHERE keyspace_name='config_db_uuid';

table_name        | gc_grace_seconds
obj_fq_name_table | 259200
obj_shared_table  | 259200
obj_uuid_table    | 259200
Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search