Support Support Downloads Knowledge Base Apex Support Portal Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[MX/PTX] Frequent FPC disconnects from REs running virtualized Junos as a guest OS

0

0

Article ID: KB36384 KB Last Updated: 26 Apr 2021Version: 2.0
Summary:

This article explains a scenario where one or more FPCs can go offline or reboot due to a hardware component experiencing a transient failure on the RE (Routing Engine).

Symptoms:

One or more FPCs intermittently go offline and come back online as the routing-engine closes the connection to them.

The following messages are observed in the vmhost logs:

user@router> show vmhost logs syslog | match "rngd|tpm"
<DATE>T00:00:00.291227+00:00 router-node kernel: tpm_tis 00:0c: A TPM error (38) occurred attempting get random
<DATE>T00:00:00.326411+00:00 router-node rngd: read error
<DATE>T00:00:00.326426+00:00 router-node rngd: No entropy sources working, exiting rngd

This has only been observed on routers running with virtualized Junos.

Examples:

  • RE-S-X6-64G
  • RE-S-1600x8
  • RE-S-2X00x6
  • REMX2K-X8-64G
  • RE-PTX-X8-64G
  • EX9200-RE2
  • PTX10K-RE
Solution:

The Random Number Generator or RNG is a component which can be implemented in hardware or software that takes in a random input and generates a random output. This random output is used by the system for multiple purposes including security and communication socket with the FPC.

For the Juniper MX or PTX devices, these random number generators (RNG) are located on the TPM chip (which handles the encryption) on the Routing-engine.

The RNG uses the random inputs stored in /dev/random or a different directory depending on the code. It is the kernel's responsibility to keep this input pool updated and ready to be consumed by the RNG. Over a period of time, this input source may no longer be available or may become inaccessible due to a transient hardware error. When this occurs, calls to the RNG will fail and the managing daemon (rngd) will eventually exit.

This can lead to unpredictable behavior from the Routing-engine. In some cases, a number of FPCs have been seen to go offline or reboot. This sequence can repeat multiple times unless the input source is made available again by rebooting the affected routing-engine.

The recover from this condition, perform a reset of the impacted routing-engine:

request vmhost reboot [re0|re1]

In case of systems with dual routing-engines, fail over the primary role to the working RE before rebooting the impacted RE.

The Junos software has also been updated to prevent this issue from being encountered via PR1349373 - FPCs might reboot continuously until the system is rebooted or RE switchover.

Please refer to the PR details for fixed versions.

Modification History:
2021-04-26: Minor, non-technical edit.
Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search