Support Support Downloads Knowledge Base Juniper Support Portal Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[CSO] Troubleshooting "Host is DOWN" or "Host is UNKNOWN" critical alarm

0

0

Article ID: KB37484 KB Last Updated: 29 Sep 2021Version: 1.0
Summary:

This article defines an approach to debug a HOST alarm for a host orchestrated by Contrail Service Orchestration (CSO).

Symptoms:

In the CSO UI, when users navigate to Monitor > Alerts & Alarms > Alarms, they may see the alarm description as "Host is DOWN" or "Host is UNKNOWN."

OR

When users navigate to Resources > Site Management, under the "Recent Alarms" critical section, they may see the "Host is UNKNOWN" or "Host is DOWN" alarm.

Cause:

The alarm could be due to any of the following reasons:

  • It is an actual alarm to indicate that the device has gone DOWN due to some problem. This might require console access to the device to understand the cause (could be a power outage at site).

  • It may be a false alarm, possibly due to connectivity issues between the host and CSO.

  • It may be that the telemetry data needs to reach CSO to conclude the host's status. To understand this problem, you would need to check multiple aspects.

Solution:

Perform the following to determine if there is a problem:

  1. Check the device ID by using the following API via Postman:

https://<CSO UI>/ems-central/device
"device": [
    {
        "fq_name": [
            "default-domain",
            "default-project",
            "Juniper-SRX-J1"
        ],
        "uuid": "31febd7b-664d-5718-54a9-51ac56edef4a",
        "uri": "/ems-central/device/31febd7b-664d-5718-54a9-51ac56edef4a"
    },
  1. Confirm that monitored_device_info is populated for the above UUID.

https://<CSO UI>/fmpm-collector/monitored_device_info/
{
    "total": 900,
    "monitored_device_info": [
{
    "fq_name": [
        "31febd7b-664d-5718-54a9-51ac56edef4a"
    ],
    "uuid": "11fead7b-764d-4718-44a9-01ac56edef4a",
    "uri": "/fmpm-collector/monitored_device_info/11fead7b-764d-4718-44a9-01ac56edef4a"
},
  1. For more details, you can check the URI of monitored_device_info:

https://<CSO UI>/fmpm-collector/monitored_device_info/11fead7b-764d-4718-44a9-01ac56edef4a
 

To determine if there are any problems related to the monitoring status, you would need to log in to Kibana.

  1. Open the Kibana UI and use the following filters:

    • Add a filter for service_app_name: telemetry-agent, device-connectivity, config-service and so on.

    • In the search section, add the device UUID, for example, "31febd7b-664d-5718-54a9-51ac56edef4a."

A correctly monitored host would show a telemetry-agent message as the following JSON snippet:

"service_modulename": "fmpm_listener",
"message": "Monitoring started for device id=31febd7b-664d-5718-54a9-51ac56edef4a and collector object id=11fead7b-764d-4718-44a9-01ac56edef4a for sensor-type=host",
"type": "csp_logs",

The above messages should populate once every 60 seconds.

An incorrect state, on the other hand, may involve the following:

  • The "monitoring started message" not populating every 60 seconds

OR

  • Other error messages being displayed along with a message similar to the following:

"service_modulename": "srx",
"message": "Command execution failed for request:['<get-software-information></get-software-information>'] and device id 31febd7b-664d-5718-54a9-51ac56edef4a",
"type": "csp_logs",

If you see any of these failure messages or error messages other than "Monitoring started for device," contact Support for assistance.

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search