There is a behavior change introduced in versions 3.2.10.0/4.0.3.0/4.1.1.0 where nodemgr processes no longer run as 'root'. It runs as 'contrail' user instead. As a result, when users upgrade from a previous release using server manager scripts, such as Fuel, nodemgr may fail to come up if an extra line of '
chown=contrail:contrail'
is not inserted into all existing supervisord config files by Fuel provisioning script. However, FAB script upgrade is immune to such failure.
After an upgrade to 3.2.10.0/4.0.3.0/4.1.1.0 using Fuel completed, nodemgr processes fail to come up.
contrail-status -d
== Contrail Control ==
supervisor-control: active
contrail-control initializing (Number of connections:4, Expected:5 Missing: IFMap:IFMapServer)pid 3323, uptime 1:36:22
contrail-control-nodemgr failed Exited too quickly (process log may have details)
contrail-dns active pid 3324, uptime 1:36:22
contrail-named active pid 3325, uptime 1:36:22
== Contrail Config ==
supervisor-config: active
contrail-api:0 initializing (Collector, Discovery:Collector[Subscribe - Status Code 503] connection down)pid 3329, uptime 1:36:23
contrail-config-nodemgr failed Exited too quickly (process log may have details)
contrail-device-manager backup pid 9585, uptime 1:34:53
contrail-discovery:0 initializing (Collector, Discovery:Collector[Subscribe - Connection Error] connection down)pid 3328, uptime 1:36:23
contrail-schema backup pid 9580, uptime 1:34:53
contrail-svc-monitor backup pid 9595, uptime 1:34:53
ifmap active pid 3327, uptime 1:36:23
== Contrail Web UI ==
supervisor-webui: active
contrail-webui active pid 3331, uptime 1:36:23
contrail-webui-middleware active pid 3333, uptime 1:36:23
== Contrail Database ==
contrail-database: active
== Contrail Supervisor Database ==
supervisor-database: active
contrail-database-nodemgr failed Exited too quickly (process log may have details)
When checking corresponding nodemgr logs, the following error is seen, which indicates a permission type failure:
Traceback (most recent call last):
File "/usr/bin/contrail-nodemgr", line 9, in <module>
load_entry_point('nodemgr==0.1dev', 'console_scripts', 'contrail-nodemgr')()
File "/usr/lib/python2.7/dist-packages/nodemgr/main.py", line 215, in main
**dss_kwargs)
File "/usr/lib/python2.7/dist-packages/nodemgr/database_nodemgr/database_event_manager.py", line 69, in __init__
self.add_current_process()
File "/usr/lib/python2.7/dist-packages/nodemgr/common/event_manager.py", line 117, in add_current_process
self.process_state_db = self.get_current_process()
File "/usr/lib/python2.7/dist-packages/nodemgr/common/event_manager.py", line 96, in get_current_process
for proc_info in proxy.supervisor.getAllProcessInfo():
File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
return self.__send(self.__name, args)
File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
verbose=self.__verbose
File "/usr/lib/python2.7/dist-packages/supervisor/xmlrpc.py", line 460, in request
self.connection.request('POST', handler, request_body, self.headers)
File "/usr/lib/python2.7/httplib.py", line 1017, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 1051, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 1013, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 864, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 826, in send
self.connect()
File "/usr/lib/python2.7/dist-packages/supervisor/xmlrpc.py", line 481, in connect
self.sock.connect(self.socketfile)
File "/usr/lib/python2.7/dist-packages/gevent/socket.py", line 349, in connect
raise error(result, strerror(result))
socket.error: [Errno 13] Permission denied
The error is due to a behavior change in the aforementioned releases where nodemgr is no longer running as 'root' but as 'contrail' while on customers setups, 'contrail' is usually not a sudoer. The nodemgr processes require sudo privilege to run. The behavior change is implemented by adding 'user=contrail'
to each nodemgr's ini file.
Example:
root@aio3290:~# cat /etc/contrail/supervisord_config_files/contrail-config-nodemgr.ini |grep user
user=contrail ; setuid to this UNIX account to run the program
However, the following line is not seen in a previous release:
root@aio3035:/etc/contrail# cat /etc/contrail/supervisord_config_files/contrail-config-nodemgr.ini |grep user
The solution is to make sure an extra line of 'chown=contrail:contail'
wis added to each nodemgr's config file, such as the supervisord config file, upon upgrade. FAB script automatically takes care of this, but for customers' preferred scripts such as fuel/juju/puppet etc., this line needs to be done by their upgrade scripts.
When checking a successfully upgraded contrail cluster, the following 'chown=contrail:contrail'
lines are presented in each supervisord config files:
root@aio3290:/etc/contrail# grep --exclude-dir=dir -R -i "chown=contrail:contrail" *
supervisord_analytics.conf:chown=contrail:contrail ; socket file uid:gid owner
supervisord_config.conf:chown=contrail:contrail ; socket file uid:gid owner
supervisord_control.conf:chown=contrail:contrail ; socket file uid:gid owner
supervisord_database.conf:chown=contrail:contrail ; socket file uid:gid owner
supervisord_vrouter.conf:chown=contrail:contrail ; socket file uid:gid owner
2020-05-15: Archive article as upgrade via FAB scripts are eol