Support Support Downloads Knowledge Base Apex Support Portal Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[CS0] Error 'FATAL cannot connect to etcd cluster'

1

0

Article ID: KB36453 KB Last Updated: 03 Feb 2021Version: 1.0
Summary:

CSO (Contrail service orchestration) has a container based microservices architecture. With an all in one installation, all the CSO infrastructure components are hosted in a single server.

After a CSO install server rebooted, the CSO GUI is down. This article explains how to troubleshoot this issue.

Symptoms:

Logged into the CSO startup server VM and many container PODs are down including the POD responsible for the CSO GUI.

root@test1:~# kubectl get pods -n central | egrep -v 1/1
NAME                                                       READY   STATUS             RESTARTS   AGE
csp.admin-portal-ui-78749cb74f-6rlpj                       0/1     CrashLoopBackOff   2068       46d
csp.csp-cslm-signature-ims-central-554c954875-6mn9j        3/3     Running            11         46d
csp.csp-cslm-signature-ims-central-core-74fc6496d8-ptmc7   3/3     Running            13         46d
csp.csp-dms-cms-inv-central-69f895bb89-99pg4               3/3     Running            10         46d
csp.csp-dms-cms-inv-central-core-6cc9df5d88-45xk8          3/3     Running            10         46d
csp.csp-hybrid-routing-pslam-598457cbd6-kwskr              3/3     Running            10         46d
csp.csp-hybrid-routing-pslam-core-5448b7db86-92dl8         3/3     Running            9          46d
csp.csp-iamsvc-iamsvc-noauth-697fd5d548-4w4fc              2/2     Running            7          46d
csp.csp-nso-vnfm-55fb664568-w2776                          0/2     CrashLoopBackOff   4149       46d
csp.csp-nso-vnfm-core-6979cb99cb-7kslh                     0/2     CrashLoopBackOff   4146       46d
csp.csp-policy-mgmt-shared-core-5c647d7cc7-9fddr           2/2     Running            10         46d
csp.csp-policy-mgmt-shared-ipam-55749c64c9-k6mhb           3/3     Running            9          46d
csp.csp-sse-6dbf68f5b9-m4cm5                               0/1     CrashLoopBackOff   2068       46d
csp.csp-template-as-central-7468bbfd46-drb6n               2/2     Running            7          46d
csp.ne-5c8d5fcb95-2lrlc                                    0/1     CrashLoopBackOff   2075       46d
csp.ne-core-f75d7b4db-st9sr                                0/1     CrashLoopBackOff   2074       46d
csp.nsd-ui-67d9458b67-rksbp                                0/1     CrashLoopBackOff   2068       46d

Checking the affected container logs reveals the problematic containers are not able to reach the etcd component.

root@test1:/var/log/pods/central_csp.admin-portal-ui-78749cb74f-6rlpj_ffe01c5d-ebad-44da-9143-9e0ad0fdadec/admin-portal-ui# tail -100 2170.log 
{"log":"2\n","stream":"stdout","time":"2021-01-22T09:19:20.088897204Z"}
{"log":"1\n","stream":"stdout","time":"2021-01-22T09:19:20.088978244Z"}
{"log":"./confd -onetime -backend etcd -confdir ../../confd  -node http://etcd-etcd.infra.svc.cluster.local:2379\n","stream":"stdout","time":"2021-01-22T09:19:20.088986458Z"}
{"log":"2021-01-22T01:19:17-08:00 csp.admin-portal-ui-78749cb74f-6rlpj ./confd[12]: INFO Backend set to etcd\n","stream":"stdout","time":"2021-01-22T09:19:20.088995218Z"}
{"log":"2021-01-22T01:19:17-08:00 csp.admin-portal-ui-78749cb74f-6rlpj ./confd[12]: INFO Starting confd\n","stream":"stdout","time":"2021-01-22T09:19:20.08900064Z"}
{"log":"2021-01-22T01:19:17-08:00 csp.admin-portal-ui-78749cb74f-6rlpj ./confd[12]: INFO Backend nodes set to http://etcd-etcd.infra.svc.cluster.local:2379\n","stream":"stdout","time":"2021-01-22T09:19:20.089006113Z"}
{"log":"2021-01-22T01:19:20-08:00 csp.admin-portal-ui-78749cb74f-6rlpj ./confd[12]: FATAL cannot connect to etcd cluster: http://etcd-etcd.infra.svc.cluster.local:2379\n","stream":"stdout","time":"2021-01-22T09:19:20.089011888Z"}
{"log":"Error: Working directory: /slipstream/config/confd/script\n","stream":"stdout","time":"2021-01-22T09:19:20.089017454Z"}
{"log":"Error: Command ./confd -onetime -backend etcd -confdir ../../confd  -node http://etcd-etcd.infra.svc.cluster.local:2379 returned with exit code 1\n","stream":"stdout","time":"2021-01-22T09:19:20.089022634Z"}
Solution:
  1. From the CSO startup server, it shows the nameserver [dns] is not pingable.

    root@startupserver1:~# cat /etc/resolv.conf 
    # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
    #     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
    nameserver 10.215.194.50
    search juniper.net
    root@startupserver1:~# ping 10.215.194.50
    PING 10.215.194.50 (10.215.194.50) 56(84) bytes of data.
    ^C
    --- 10.215.194.50 ping statistics ---
    18 packets transmitted, 0 received, 100% packet loss, time 17135ms


    Tried to ping the default g/w, it is also not reachable from the startup server VM.

    root@startupserver1:~# ping 10.219.90.65
    PING 10.219.90.65 (10.219.90.65) 56(84) bytes of data.
    ^C
    --- 10.219.90.65 ping statistics ---
    39 packets transmitted, 0 received, 100% packet loss, time 38304ms

     


    Tried to ping the nameserver from the startup server and capture packet in the br0 interface of install server. 
    root@test7:~# tcpdump -ni br0 | grep 10.215
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on br0, link-type EN10MB (Ethernet), capture size 262144 bytes
    20:23:09.709851 IP 192.168.10.28.1656 > 10.215.194.50.53: 22614+ NS? . (17)
    20:23:09.709945 IP 192.168.10.28.49512 > 10.215.194.50.53: 35097+ NS? . (17)
    20:23:09.748158 IP 192.168.10.33 > 10.215.194.50: ICMP echo request, id 10480, seq 7, length 64
    20:23:10.756245 IP 192.168.10.33 > 10.215.194.50: ICMP echo request, id 10480, seq 8, length 64
    20:23:11.210464 IP 192.168.10.28.65103 > 10.215.194.50.53: 55966+ NS? . (17)
    20:23:11.210571 IP 192.168.10.28.28162 > 10.215.194.50.53: 2242+ NS? . (17)
    20:23:11.764263 IP 192.168.10.33 > 10.215.194.50: ICMP echo request, id 10480, seq 9, length 64
    20:23:12.711128 IP 192.168.10.28.56003 > 10.215.194.50.53: 52802+ NS? . (17)
    20:23:12.711155 IP 192.168.10.28.25817 > 10.215.194.50.53: 58311+ NS? . (17)
  2. From the above capture, it shows there is no ICMP reply from the install server even though echo request had reached there.

  3. Checking the basic IPV4 forwarding in the CSO install machine found that reboot has disabled the IPV4 forwarding in the machine.

    root@test7:~# cat //proc/sys/net/ipv4/ip_forward
    0
    root@test7:~#
  4. Re-enable ipv4_forwarding with the following command:

    root@test7:~# echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward
    1
    root@test7:~#
  5. Once IPV4 forwarding is enabled back, ping started working again from the startup server and all reachability was restored. This resolved the issue.

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search