This article describes how to restore problematic 'api: initializing'
status in Contrail config when 'contrail-status'
command is run in HA setup.
In some situations, simultaneous rebooting of HA clustered controller nodes deployed by ansible deployer, Contrail config might not work properly.
If '
api: initializing (Generic Connection:Keystone[] connection down)'
is seen by running
'contrail-status'
, the setup needs to take the following restoration measure.
Example output of problematic state:
[root@ctr1 ~]# contrail-status
Pod Service Original Name State Id Status
redis contrail-external-redis running 755400c0d697 Up 8 hours
analytics api contrail-analytics-api running 81b1c997a19d Up 8 hours
analytics collector contrail-analytics-collector running 4772eb475411 Up 8 hours
analytics nodemgr contrail-nodemgr running 7df5528fd594 Up 8 hours
config api contrail-controller-config-api running 4be47140c657 Up 6 hours
config device-manager contrail-controller-config-devicemgr running d3555c52c649 Up 8 hours
config nodemgr contrail-nodemgr running d52848a80974 Up 8 hours
config schema contrail-controller-config-schema running 9bef9e05abb1 Up 8 hours
config svc-monitor contrail-controller-config-svcmonitor running 53bf4c3c6639 Up 6 hours
config-database cassandra contrail-external-cassandra running 21569d549c33 Up 8 hours
config-database nodemgr contrail-nodemgr running 56f5d0e93daf Up 8 hours
config-database rabbitmq contrail-external-rabbitmq running f0bfbe0b3811 Up 8 hours
config-database zookeeper contrail-external-zookeeper running 34217f7c5f7c Up 8 hours
control control contrail-controller-control-control running 7e3f18dc9cbe Up 8 hours
control dns contrail-controller-control-dns running c345b37de524 Up 8 hours
control named contrail-controller-control-named running ff8431b699eb Up 8 hours
control nodemgr contrail-nodemgr running be65013970ea Up 8 hours
database cassandra contrail-external-cassandra running b3e66b322883 Up 8 hours
database nodemgr contrail-nodemgr running 80fe3693a5a4 Up 8 hours
database query-engine contrail-analytics-query-engine running f643b33fc956 Up 8 hours
device-manager dnsmasq contrail-external-dnsmasq running 4136df2c78b9 Up 8 hours
webui job contrail-controller-webui-job running c172691b71a7 Up 8 hours
webui web contrail-controller-webui-web running eaea00f5c064 Up 8 hours
== Contrail control ==
control: active
nodemgr: active
named: active
dns: active
== Contrail config-database ==
nodemgr: active
zookeeper: active
rabbitmq: active
cassandra: active
== Contrail database ==
nodemgr: active
query-engine: active
cassandra: active
== Contrail analytics ==
nodemgr: active
api: active
collector: active
== Contrail webui ==
web: active
job: active
== Contrail device-manager ==
== Contrail config ==
svc-monitor: backup
nodemgr: active
device-manager: backup
api: initializing (Generic Connection:Keystone[] connection down)
schema: backup
In this state, sql server (mariadb) has a trouble. The cluster will not work if all nodes stopped simultaneously(by design long with openstack kolla).
mariadb repeats restarting in all nodes per about 40 seconds.
[root@ctr1 ~]# docker ps | grep mariadb
9bcbaec82907 kolla/centos-binary-mariadb:queens "dumb-init -- kolla_…" 2 weeks ago Up 39 seconds mariadb
[root@ctr1 ~]# docker ps | grep mariadb
9bcbaec82907 kolla/centos-binary-mariadb:queens "dumb-init -- kolla_…" 2 weeks ago Up 40 seconds mariadb
[root@ctr1 ~]# docker ps | grep mariadb
9bcbaec82907 kolla/centos-binary-mariadb:queens "dumb-init -- kolla_…" 2 weeks ago Up Less than a second mariadb
[root@ctr1 ~]# docker ps | grep mariadb
9bcbaec82907 kolla/centos-binary-mariadb:queens "dumb-init -- kolla_…" 2 weeks ago Up 1 second mariadb
[root@ctr1 ~]# docker ps | grep mariadb
9bcbaec82907 kolla/centos-binary-mariadb:queens "dumb-init -- kolla_…" 2 weeks ago Up 2 seconds mariadb
By starting with bootstrap option, SQL cluster will start up.
-
Stop mariadb container in a controller node.
docker stop mariadb
-
Move to _data directory.
cd /var/lib/docker/volumes/mariadb/_data
-
Check the grastate.dat file. "seqno" should show -1.
[root@ctr1 _data]# cat grastate.dat
# GALERA saved state
version: 2.1
uuid: 6c228030-9972-11ea-8072-0f2fcd13f957
seqno: -1
cert_index:
-
Add following line into grastate.dat at the bottom.
safe_to_bootstrap: 1
-
Issue following command:
docker run --net host --name mariadbbootstrap \
-v /etc/localtime:/etc/localtime:ro -v kolla_logs:/var/log/kolla/ -v mariadb:/var/lib/mysql -v /etc/kolla/mariadb/:/var/lib/kolla/config_files/:ro \
--restart on-failure:10 --env KOLLA_CONFIG_STRATEGY=COPY_ALWAYS --env BOOTSTRAP_ARGS='--wsrep-new-cluster' kolla/centos-binary-mariadb:queens
Then, you will see the following logs:
+ sudo -E kolla_set_configs
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Deleting /etc//my.cnf
<snip>
++ cat /run_command
+ CMD=/usr/bin/mysqld_safe
+ ARGS=
+ [[ ! -n '' ]]
+ . kolla_extend_start
++ [[ ! -d /var/log/kolla/mariadb ]]
+++ stat -c %a /var/log/kolla/mariadb
++ [[ 2755 != \7\5\5 ]]
++ chmod 755 /var/log/kolla/mariadb
Running command: '/usr/bin/mysqld_safe --wsrep-new-cluster'
++ [[ -n '' ]]
++ [[ -n 0 ]]
++ ARGS=--wsrep-new-cluster
+ echo 'Running command: '\''/usr/bin/mysqld_safe --wsrep-new-cluster'\'''
+ exec /usr/bin/mysqld_safe --wsrep-new-cluster
200521 13:46:46 mysqld_safe Logging to '/var/log/kolla/mariadb/mariadb.log'.
200521 13:46:47 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql/
200521 13:46:47 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql//wsrep_recovery.1c3wQu' --pid-file='/var/lib/mysql//ctr1-recover.pid'
200521 13:46:51 mysqld_safe WSREP: Recovered position 6c228030-9972-11ea-8072-0f2fcd13f957:407649
Now, the other two nodes start working:
[root@ctr2 ~]# tail -f /var/lib/docker/volumes/kolla_logs/_data/mariadb/mariadb.log
2020-05-22 6:47:33 140079655401664 [Note] /usr/libexec/mysqld: ready for connections.
Version: '10.1.20-MariaDB' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server
2020-05-22 6:47:33 140074936666880 [Note] WSREP: Synchronized with group, ready for connections
- In the controller which you operated the above steps, stop and remove mariadbbootstrap container, and start maradb again.
[root@ctr1 _data]#
docker stop mariadbbootstrap
docker rm mariadbbootstrap
docker start mariadb
You can see the rest one controller node comes back working:
[root@ctr1 ~]# docker ps | grep mariadb
fae7fb310a93 kolla/centos-binary-mariadb:queens "dumb-init -- kolla_…" 2 weeks ago Up 2 minutes mariadb
[root@ctr1 ~]# contrail-status
<snip>
== Contrail config ==
svc-monitor: backup
nodemgr: active
device-manager: backup
api: active
schema: backup