Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[MX] TFEB crash/reboot due to high CPU utlization

0

0

Article ID: KB35809 KB Last Updated: 21 May 2020Version: 1.0
Summary:

The TFEB (Trio Forwarding Engine Board) is a control board that communicates with the Routing Engine using a dedicated link and transfers routing table data from the Routing Engine to the Forwarding Table in the Application-Specific Integrated Circuit (ASIC). The link is also used to transfer routing link-state updates and other packets destined for the router from the TFEB to the Routing Engine. 

Cause:

The crash of the TFEB in the chassis is a consequence of the high CPU event. TFEB crashes because the RE and TFEB use a dedicated IPC connection which is created by the software. This functionality is carried forward by maintaining keepalives between RE and TFEB, similar to TCP based protocols. If the keepalives are lost, then the ukernel of the TFEB will trigger a reboot to try to recover its communication with the Routing Engine.

Software triggered reboot of the TFEB is not necessarily an indication of a hardware failure. It is in most cases, a self-recovery mechanism. When the device is being overwhelmed with incoming traffic, the DDoS protection violation logs indicate exceeded bandwidth on the FPCs. When such CPU interrupts occur at a certain level, it can cause TFEB to crash due to PANIC.

Apr 16 14:20:40 jddosd[2149]: DDOS_PROTOCOL_VIOLATION_SET: Warning: Host-bound traffic for protocol/exception  resolve:ucast-v4 exceeded its allowed bandwidth at fpc 0 for  3 times, started at 2019-04-16 16:10:39 MDT
Apr 16 14:25:53  rcv: ch_ipc_dispatch() null ipc read for args 0x2342020 pipe 0x23412c0, fru TFEB 0
Apr 16 14:25:53  -- TFEB 0, last request 0, state Online
Apr 16 14:29:53  disconnect: slot 0
Apr 16 14:30:17   ppmd[1944]: PPMD: Connection Shutdown/Closed with  PFE: tfeb0
Apr 16 14:30:44   chassisd[1933]: CHASSISD_SHUTDOWN_NOTICE: Shutdown reason: TFEB connection lost  <-- Loss of keepalives between RE and TFEB
Apr 16 14:30:53 CHASSISD_IFDEV_DETACH_FPC: ifdev_detach_fpc(0)
Apr 16 14:31:22   tfeb0 CLKSYNC: failed to connect to Master
Apr 16 14:31:23   tfeb0 L2ALM: failed to connect to Master

The behavior detected on the router is due to a PANIC event triggered by an overwhelmed TFEB card.

As shown below, the router CPU consumption records a high percentage:

Routing Engine status:
Temperature                 43 degrees C / 109 degrees F
CPU temperature             52 degrees C / 125 degrees F
DRAM                      2048 MB (2048 MB installed)
Memory utilization          53 percent
5 sec CPU utilization:
User                      33 percent
Background                 0 percent
Kernel                    61 percent
Interrupt                  1 percent
Idle                       5 percent  <--  CPU utilization is at 95%

 
Solution:

In case the TFEB fails to come back online automatically by the software, restart the TFEB from the CLI using the below command:

lab@router>request chassis tfeb restart

Then, verify the TFEB status:

lab@router# run show chassis tfeb
TFEB status:
Slot 0 information:
State Online
Intake temperature 40 degrees C / 104 degrees F
Exhaust temperature 56 degrees C / 132 degrees F
CPU utilization 12 percent
Interrupt utilization 0 percent
Heap utilization 27 percent
Buffer utilization 13 percent
Total CPU DRAM 1024 MB
Start time: 2019-04-17 12:15:44 UTC
Uptime: 15 minutes, 43 seconds


In case the logs are flooded with DDoS violation messages even after the CLI restart and CPU utilization remains all-time high, we might need to identify the source of traffic in order to prevent further abnormalities. For identifying the source of the traffic, configure the flow monitoring to identify which IP address is sending the excess data causing the DDOS violation resulting in the traffic impact.

Refer to the technical documentation on Configuring How Flow Detection Operates for Individual Protocol Groups or Packets

Example configuration based on the log message posted in this document:

>set system ddos-protection protocols resolve ucast-v4 flow-detection-mode automatic

Related Links

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search