Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[Contrail] 1909 - channel error: 405: AMQP_QUEUE_DELETE_METHOD caused: RESOURCE_LOCKED - cannot obtain exclusive access to locked queue

1

0

Article ID: KB35198 KB Last Updated: 11 Aug 2020Version: 3.0
Summary:

This article describes the specific issue that is seen in Contrail release 1909 with respect to rabbitmq queues not clearing up when Contrail Controller Services are interrupted.

 

Symptoms:

Impacted services include Contrail Controllers, rabbitmq, and Contrail Collectors.

Snippet of contrail-status

== Contrail control ==
control: initializing (Number of connections:2, Expected:3 Missing: Database:Cassandra, IFMap Server End-Of-RIB not computed, No BGP configuration for self)
nodemgr: active
named: active
dns: active

== Contrail analytics-alarm ==
nodemgr: active
kafka: active
alarm-gen: active

== Contrail analytics ==
nodemgr: active
api: active
collector: initializing (Number of connections:5, Expected:6 Missing: Database:Cassandra)

== Contrail config-database ==
nodemgr: active
zookeeper: active
rabbitmq: active
cassandra: active

== Contrail webui ==
web: active
job: active

== Contrail device-manager ==

== Contrail config ==
svc-monitor: active
nodemgr: active
device-manager: active
api: active
schema: backup

 

Cause:

Exclusive queues are not being deleted when Contrail Controller Services are restarted unless rabbitmq is manually stopped.

Release 1909 uses exclusive queues.

From Release 1910 onward, exclusive queues have been removed.

https://review.opencontrail.org/c/Juniper/contrail-common/+/54478

For more information about exclusive queues, refer to ‚Äčhttps://www.rabbitmq.com/queues.html.

 

Solution:

How to verify this issue?

Check the container control_control_1 errorlogs and find any matches as below:

Logs can be checked by using docker inspect control_control_1 | grep log or docker logs control_control_1.

RabbitMQ SM: Caught exception while connecting to RabbitMQ: <ip>:5672 : channel error: 405: AMQP_QUEUE_DELETE_METHOD caused: RESOURCE_LOCKED - cannot obtain exclusive access to locked queue 'contrail-control.ip-ip' in vhost '/'

This issue has been fixed in Contrail release 1910.

If you are facing this issue in release 1909, the following can be implemented as a workaround:

Note: In a multi-node setup, this needs to be executed on all Contrail Controller nodes.

  1. Stop the rabbitmq process from the host by using docker exec config_database_rabbitmq_1 rabbitmqctl stop (all three nodes in case of HA).

  2. Find the mount source location of mnesia for the config_database_rabbitmq_1 container in the specific node by using docker inspect config_database_rabbitmq_1 | less (all three nodes in the case of HA). In the following example, it is /var/lib/docker/volumes/cb6adb9013d3e9b21fa918cc34465bd7fc65681221e3946329b704ce625e40f1/_data.

 

        "Mounts": [
            {
                "Type": "bind",
                "Source": "/etc/contrail/ssl",
                "Destination": "/etc/contrail/ssl",
                "Mode": "rw",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "volume",
                "Name": "cb6adb9013d3e9b21fa918cc34465bd7fc65681221e3946329b704ce625e40f1",
                "Source": "/var/lib/docker/volumes/cb6adb9013d3e9b21fa918cc34465bd7fc65681221e3946329b704ce625e40f1/_data",
                "Destination": "/var/lib/rabbitmq",
                "Driver": "local",
                "Mode": "",
                "RW": true,
                "Propagation": ""
            }
 
  1. Change directory: cd /var/lib/docker/volumes/cb6adb9013d3e9b21fa918cc34465bd7fc65681221e3946329b704ce625e40f1/_data.

  2. Change directory: cd mnesia.

  3. Remove everything in the mnesia directory (all three nodes in case of HA). (Note: Please do not delete the mnesia directory.)

  4. Start config_database_rabbitmq_1 by using docker start config_database_rabbitmq_1 (all three nodes in the case of HA).

  5. Restart the docker container control_control_1 by using the command docker restart control_control_1 in a rolling manner (on all nodes) to avoid any down time.

In some scenarios it is observed that when rabbitmq is down, all changes to configuration are not notified to the control-node. This results in the control-node ending up with stale data. Hence, restart of the container control_control_1 is suggested, which forces it to read from the database again.

 

Modification History:
  • 2020-08-11: Corrected commit link and added note about mnesia

  • 2019-12-30: Note about restart of control_control_1 container added to the Solution section

 

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search