Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[Contrail] Cassandra down in Contrail analytics nodes due to java.lang.OutOfMemoryError: Java heap space

1

1

Article ID: KB35008 KB Last Updated: 16 Jun 2020Version: 4.0
Summary:

An issue with Cassandra being out of memory is seen frequently in Contrail analytics nodes.

This article applies specifically to Contrail 5.X release.

Symptoms:

Output of contrail status

== Contrail database ==
kafka: active
nodemgr: initializing (Cassandra state detected DOWN. )
zookeeper: active
cassandra: down
Cause:

This issue is due to cassandra out of memory. The error messages are printed in system.log and debug.log.

ERROR [MessagingService-Incoming-/X.X.X.X] 2019-07-03 13:32:09,110 CassandraDaemon.java:228 - Exception in thread Thread[MessagingService-Incoming

-/X.X.X.X],5,main]
java.lang.OutOfMemoryError: Java heap space
        at java.io.DataInputStream.readUTF(DataInputStream.java:602) ~[na:1.8.0_171]
        at org.apache.cassandra.io.util.RebufferingInputStream.readUTF(RebufferingInputStream.java:263) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.gms.GossipDigestSynSerializer.deserialize(GossipDigestSyn.java:92) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.gms.GossipDigestSynSerializer.deserialize(GossipDigestSyn.java:81) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.net.MessageIn.read(MessageIn.java:123) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:192) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:180) ~[apache-cassandra-3.11.2.jar:3.11.2]
        at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:94) ~[apache-cassandra-3.11.2.jar:3.11.2]
Solution:

Cassandra error logs are found in 'debug.log' and 'system.log'. Run the following commands on analytics host to find the exact location:

find / -name debug.log
find / -name system.log

The results are similar to the example below.

Note: Container ID may differ according to the environment.

Look into the latest timestamp to understand the failure reason:

/var/lib/docker/overlay2/94d77eb0c1efee4c86556963581a798a5a0105322a5b79cfa40c846fac26f4d3/merged/var/log/cassandra/system.log
/var/lib/docker/overlay2/94d77eb0c1efee4c86556963581a798a5a0105322a5b79cfa40c846fac26f4d3/merged/var/log/cassandra/debug.log

If the error is similar to 'java.lang.OutOfMemoryError: Java heap space', the steps below can be performed:

  1. Find cassandra-env.sh in the host using command: find / -name cassandra-env.sh

    Output will be similar to below:

    # find / -name cassandra-env.sh
    /var/lib/docker/overlay2/1b13905b002cd0cce3d3f57dbe38fb3befe1e428f5d2f9ae6d75f2dae6ab490a/diff/etc/cassandra/cassandra-env.sh
    /var/lib/docker/overlay2/a61496cf74de98b6a4488bddeaf0b752227e86b81583da83b653ef705c14bd42/diff/etc/cassandra/cassandra-env.sh
    /var/lib/docker/overlay2/7cba59291f2826e99ec9ab96193e39f30a0d81d09ac28f7827a0b4be4145c0cc/merged/etc/cassandra/cassandra-env.sh
    /var/lib/docker/overlay2/2bc8ce94b1969cdc5a2c150bb86d91d30742172ffa3e286c4f1a68ea9b31aa62/merged/etc/cassandra/cassandra-env.sh
    /var/lib/docker/overlay2/b1f800b9c3d49d5cc8b2b50a2ef954959cd02c2a3866de325373a1d754fa4a42/diff/etc/cassandra/cassandra-env.sh
  2. Edit /var/lib/docker/overlay2/<container_id>/merged/etc/cassandra/cassandra-env.sh (There are two cassandra-env.sh for different container eg. analytics_database_cassandra or contrail_analytics_database, you may need to edit based on contrail-status output)

  3. Edit Cassandra-env.sh and search for values of MAX_HEAP_SIZE and HEAP_NEWSIZE. Set them as 16G and 2G respectively.

    Before changes:
    #MAX_HEAP_SIZE="4G"
    #HEAP_NEWSIZE="800M"

     
    After changes:
    MAX_HEAP_SIZE="16G"
    HEAP_NEWSIZE="2G"
  4. Restart docker container using : docker restart <container name>

    This procedure needs to be done in all contrail analytics nodes.

Note: ​MAX_HEAP_SIZE can be adjusted up to half the size of RAM allocated to system. HEAP_NEWSIZE can be up to 1/4th of MAX_HEAP_SIZE

For an RHOSP deployment, the following can be included in contrail-services.yaml to make it persistent for deployments:

ContrailAnalyticsDatabaseParameters:
  ContrailSettings:
    MAX_HEAP_SIZE: 16G
    HEAP_NEWSIZE: 2G

Note: Please ensure that roles_data.yaml has matching role OS::TripleO::Services::ContrailAnalyticsDatabase for this to work.

Modification History:
2019-09-12: Corrected RHOSP deployment information in the Solution.
2019-12-07: Added note about roles_data.yaml and note about container name.
2020-10-06: cassandra-env.sh section improved for clarity.
Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search