Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[MX] One failed FPC triggers system failure by blocking internal connectivity between RE and other FPCs.

0

0

Article ID: KB35609 KB Last Updated: 27 Mar 2020Version: 1.0
Summary:

When one failed FPC tries to reboot over and over again, it keeps the PCI bus busy. This blocks communication between the RE and other working FPCs.

Symptoms:

Sample log messages indicating FPC going into a boot loop.

FPC goes offline due to a hardware issue:

CHASSISD_IPC_CONNECTION_DROPPED: Dropped IPC connection for FPC 2

FPC tries to boot up:

CHASSISD_SNMP_TRAP10: SNMP trap generated: FRU power on (jnxFruContentsIndex 7, jnxFruL1Index 11, jnxFruL2Index 0, jnxFruL3Index 0, jnxFruName FPC: MPC4E 3D 2CGE+8XGE @ 10/*/*, jnxFruType 3, jnxFruSlot 2, jnxFruOfflineReason 2, jnxFruLastPowerOff 0, jnxFruLastPowerOn 1856596303)

I/O bus busy:

fpc2 io_err bus 0 busy timeout 

This message indicating connections to other FPCs dropped by the chassis daemon:

CHASSISD_IPC_CONNECTION_DROPPED: Dropped IPC connection for FPC 1
Cause:

The FPC that has the hardware error tries to boot again and again and goes into a boot loop, keeping the PCIE bus busy. This bus is shared by other FPCs as well to communicate with the RE. As keepalives from working FPCs are not received, they are eventually offlined by the system. Hence, a single FRU can cause a meltdown for the entire system.

Solution:

Although an FPC having a major error will be offlined by the system  it is advisable to offline the FPC using the CLI to avoid this scenario: 

request chassis fpc offline slot 2
This issue is documented in PR1319560 - The MPC with specific failure hardware might impact other MPCs in the same chassis

Fixed JUNOS releases:

  • 17.2R3
  • 18.1R2
  • 18.2R1
  • 16.1R4-S12
  • 17.3R3
  • 17.4R2
  • 17.2X75-D91
  • 16.1R7-S1
  • 17.4R1-S3
  • 18.2X75-D5
In the fixed releases, the traffic from the failed FPC is rate limited at the internal ethernet switch, which is used to switch traffic between the RE and other FPCs.‚Äč
Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search