Support Support Downloads Knowledge Base Service Request Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[Contrail] Cassandra down in Contrail analytics nodes due to java.lang.OutOfMemoryError: Java heap space

0

0

Article ID: KB35008 KB Last Updated: 07 Dec 2019Version: 3.0
Summary:

An issue with Cassandra being out of memory is seen frequently in Contrail analytics nodes.

This article applies specifically to Contrail 5.X release.

Symptoms:

Output of contrail status

== Contrail database ==
kafka: active
nodemgr: initializing (Cassandra state detected DOWN. )
zookeeper: active
cassandra: down
Cause:

This issue is due to cassandra out of memory. The error messages are printed in system.log and debug.log.

ERROR [MessagingService-Incoming-/X.X.X.X] 2019-07-03 13:32:09,110 CassandraDaemon.java:228 - Exception in thread Thread[MessagingService-Incoming

-/X.X.X.X],5,main]
java.lang.OutOfMemoryError: Java heap space
        at java.io.DataInputStream.readUTF(DataInputStream.java:602) ~[na:1.8.0_171]
        at org.apache.cassandra.io.util.RebufferingInputStream.readUTF(RebufferingInputStream.java:263) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.gms.GossipDigestSynSerializer.deserialize(GossipDigestSyn.java:92) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.gms.GossipDigestSynSerializer.deserialize(GossipDigestSyn.java:81) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.net.MessageIn.read(MessageIn.java:123) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:192) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:180) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:94) ~[apache-cassandra-3.11.2.jar:3.11.2]
Solution:

Cassandra error logs are found in 'debug.log' and 'system.log'. Run the following commands on analytics host to find the exact location:

find / -name debug.log
find / -name system.log

The results are similar to the example below.

Note: Container ID may differ according to the environment.

Look into the latest timestamp to understand the failure reason:

/var/lib/docker/overlay2/94d77eb0c1efee4c86556963581a798a5a0105322a5b79cfa40c846fac26f4d3/merged/var/log/cassandra/system.log
/var/lib/docker/overlay2/94d77eb0c1efee4c86556963581a798a5a0105322a5b79cfa40c846fac26f4d3/merged/var/log/cassandra/debug.log

If the error is similar to 'java.lang.OutOfMemoryError: Java heap space', the steps below can be performed:

  1. Login to docker analytics database container using command: docker exec -it <container name> bash
    Note : Container name may differ according to environment and different releases. eg. analytics_database_cassandra or contrail_analytics_database

  2. Navigate to /etc/Cassandra using command: cd /etc/cassandra

  3. Edit Cassandra-env.sh and search for values of MAX_HEAP_SIZE and HEAP_NEWSIZE. Set them as 16G and 2G respectively.

    Before changes:
    #MAX_HEAP_SIZE="4G"
    #HEAP_NEWSIZE="800M"
     
    After changes:
    MAX_HEAP_SIZE="16G"
    HEAP_NEWSIZE="2G"
  4. Restart docker container using : docker restart <container name>

    This procedure needs to be done in all contrail analytics nodes.

Note: ​MAX_HEAP_SIZE can be adjusted up to half the size of RAM allocated to system. HEAP_NEWSIZE can be up to 1/4th of MAX_HEAP_SIZE

For an RHOSP deployment, the following can be included in contrail-services.yaml to make it persistent for deployments:

ContrailAnalyticsDatabaseParameters:
  ContrailSettings:
    MAX_HEAP_SIZE: 16G
    HEAP_NEWSIZE: 2G

Note: Please ensure that roles_data.yaml has matching role OS::TripleO::Services::ContrailAnalyticsDatabase for this to work.

Modification History:
2019-09-12: Corrected RHOSP deployment information in the Solution.
2019-12-07: Added note about roles_data.yaml and note about container name.
Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Security Alerts and Vulnerabilities

Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search