Support Support Downloads Knowledge Base Apex Support Portal Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[CSO] How to recover a failed CSO VM component

0

0

Article ID: KB36450 KB Last Updated: 11 Mar 2021Version: 1.0
Summary:
 

This article describes the steps to recover the Virtual Machines (VMs) in Contrail Service Orchestration (CSO) in case of any failure.

 

Symptoms:
 

VRR1 VM is not coming up after a server reboot. I am not able to telnet to my VRR.

root@host:/var/log/libvirt/qemu# virsh console vrr
error: failed to get domain 'vrr'
error: Domain not found: no domain with matching name 'vrr'

 

Cause:
 

In CSO all-in-one deployment, there are six VMs that are provisioned during deployment as shown below: 

From the CSO installation server, perform virsh list.

root@host:~# virsh list 
 Id    Name                           State
----------------------------------------------------
 1     k8-microservices1              running
 2     contrail_analytics1            running
 3     k8-infra1                      running
 4     monitoring1                    running
 5     startupserver1                 running
 7     vrr1                           running

 

Solution:
 

Perform the following:

  1. Log in to the startup server VM from the installation server. 
  2. Go to the Contrail_Service_Orchestration_5.1.2 folder and run the recovery.sh script.

root@startupserver1:~/Contrail_Service_Orchestration_5.1.2# ./recovery.sh 

Note: recovery.sh is a utility provided by CSO BU and is added automatically during installation.  

root@startupserver1:~/Contrail_Service_Orchestration_5.1.2# ./recovery.sh 

This tool assists you with recovering your CSO setup.

The following components can be recovered:

  • cassandra

  • mariadb

  • vrr

  • saltstack

  • icinga

  • rabbitmq

  1. Specify one of the components to recover (in number) : 3 (in our example)

INFO     Started recovering vrr component at 2021-01-22 01:08:41.412041 ...
INFO     VRR recovery is initiated...
ERROR    Vrr - 192.168.10.29 is unhealthy
INFO     Recovery takes time, please be patient
INFO     VRR recovery started. Please wait...
INFO     Vrr console recovered for vrr1
INFO     VRR config sync 
INFO     Completed recovering vrr component at 2021-01-22 01:21:27.054657 .
INFO     Time taken to recover 0:12:45.642616 

From the above logs, it is confirmed that the VM has been recovered. 

root@host:~# virsh list
 Id    Name                           State
----------------------------------------------------
 1     k8-microservices1              running
 2     contrail_analytics1            running
 3     k8-infra1                      running
 4     monitoring1                    running
 5     startupserver1                 running
 7     vrr1                           running

You will now be able to console to VRR1 successfully.

root@host:~# virsh console 7
Connected to domain vrr1
Escape character is ^]

root>
root> show version 
set version 15.1R6.7

 

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search