Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[Archive] [Contrail] nodemgr fails to come up after upgrade to 3.2.10.0/4.0.3.0/4.1.1.0 using Fuel

0

0

Article ID: KB33007 KB Last Updated: 22 May 2020Version: 2.0
Summary:

There is a behavior change introduced in versions 3.2.10.0/4.0.3.0/4.1.1.0 where ​nodemgr processes no longer run as 'root'. It runs as 'contrail' user instead. As a result, when users upgrade from a previous release using server manager scripts, such as Fuel, nodemgr may fail to come up if an extra line of 'chown=contrail:contrail' is not inserted into all existing supervisord config files by Fuel​ provisioning script. However, FAB script upgrade is immune to such failure.

Symptoms:

After an upgrade to 3.2.10.0/4.0.3.0/4.1.1.0 using Fuel completed, nodemgr processes fail to come up.

contrail-status -d

== Contrail Control ==
supervisor-control: active
contrail-control initializing (Number of connections:4, Expected:5 Missing: IFMap:IFMapServer)pid 3323, uptime 1:36:22
contrail-control-nodemgr failed Exited too quickly (process log may have details)
contrail-dns active pid 3324, uptime 1:36:22
contrail-named active pid 3325, uptime 1:36:22

== Contrail Config ==
supervisor-config: active
contrail-api:0 initializing (Collector, Discovery:Collector[Subscribe - Status Code 503] connection down)pid 3329, uptime 1:36:23
contrail-config-nodemgr failed Exited too quickly (process log may have details)
contrail-device-manager backup pid 9585, uptime 1:34:53
contrail-discovery:0 initializing (Collector, Discovery:Collector[Subscribe - Connection Error] connection down)pid 3328, uptime 1:36:23
contrail-schema backup pid 9580, uptime 1:34:53
contrail-svc-monitor backup pid 9595, uptime 1:34:53
ifmap active pid 3327, uptime 1:36:23

== Contrail Web UI ==
supervisor-webui: active
contrail-webui active pid 3331, uptime 1:36:23
contrail-webui-middleware active pid 3333, uptime 1:36:23

== Contrail Database ==
contrail-database: active

== Contrail Supervisor Database ==
supervisor-database: active
contrail-database-nodemgr failed Exited too quickly (process log may have details)
 

When checking corresponding nodemgr logs, the following error is seen, which indicates a permission type failure:

Traceback (most recent call last):
File "/usr/bin/contrail-nodemgr", line 9, in <module>
load_entry_point('nodemgr==0.1dev', 'console_scripts', 'contrail-nodemgr')()
File "/usr/lib/python2.7/dist-packages/nodemgr/main.py", line 215, in main
**dss_kwargs)
File "/usr/lib/python2.7/dist-packages/nodemgr/database_nodemgr/database_event_manager.py", line 69, in __init__
self.add_current_process()
File "/usr/lib/python2.7/dist-packages/nodemgr/common/event_manager.py", line 117, in add_current_process
self.process_state_db = self.get_current_process()
File "/usr/lib/python2.7/dist-packages/nodemgr/common/event_manager.py", line 96, in get_current_process
for proc_info in proxy.supervisor.getAllProcessInfo():
File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
return self.__send(self.__name, args)
File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
verbose=self.__verbose
File "/usr/lib/python2.7/dist-packages/supervisor/xmlrpc.py", line 460, in request
self.connection.request('POST', handler, request_body, self.headers)
File "/usr/lib/python2.7/httplib.py", line 1017, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 1051, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 1013, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 864, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 826, in send
self.connect()
File "/usr/lib/python2.7/dist-packages/supervisor/xmlrpc.py", line 481, in connect
self.sock.connect(self.socketfile)
File "/usr/lib/python2.7/dist-packages/gevent/socket.py", line 349, in connect
raise error(result, strerror(result))
socket.error: [Errno 13] Permission denied

Cause:

The error is due to a behavior change in the aforementioned releases where nodemgr is no longer running as 'root' but as 'contrail' while on customers setups, 'contrail' is usually not a sudoer. The nodemgr processes require sudo privilege to run. The behavior change is implemented by adding 'user=contrail' to each  nodemgr's ini file.

Example:

root@aio3290:~# cat /etc/contrail/supervisord_config_files/contrail-config-nodemgr.ini |grep user
user=contrail                  ; setuid to this UNIX account to run the program


​However, the following line is not seen in a previous release:

root@aio3035:/etc/contrail# cat /etc/contrail/supervisord_config_files/contrail-config-nodemgr.ini |grep user


 
Solution:

The solution is to make sure an extra line of 'chown=contrail:contail' wis added to each nodemgr's config file, such as the supervisord config file, upon upgrade. FAB script automatically takes care of this, but for customers' preferred scripts such as fuel/juju/puppet etc., this line needs to be done by their upgrade scripts.

When checking a successfully upgraded contrail cluster, the following 'chown=contrail:contrail' lines are presented in each supervisord config files:

root@aio3290:/etc/contrail# grep --exclude-dir=dir -R -i "chown=contrail:contrail" *
supervisord_analytics.conf:chown=contrail:contrail     ; socket file uid:gid owner
supervisord_config.conf:chown=contrail:contrail     ; socket file uid:gid owner
supervisord_control.conf:chown=contrail:contrail     ; socket file uid:gid owner
supervisord_database.conf:chown=contrail:contrail     ; socket file uid:gid owner
supervisord_vrouter.conf:chown=contrail:contrail     ; socket file uid:gid owner
Modification History:

2020-05-15: Archive article as upgrade via FAB scripts are eol

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search