The TFEB (Trio Forwarding Engine Board) is a control board that communicates with the Routing Engine using a dedicated link and transfers routing table data from the Routing Engine to the Forwarding Table in the Application-Specific Integrated Circuit (ASIC). The link is also used to transfer routing link-state updates and other packets destined for the router from the TFEB to the Routing Engine.
The crash of the TFEB in the chassis is a consequence of the high CPU event. TFEB crashes because the RE and TFEB use a dedicated IPC connection which is created by the software. This functionality is carried forward by maintaining keepalives between RE and TFEB, similar to TCP based protocols. If the keepalives are lost, then the ukernel of the TFEB will trigger a reboot to try to recover its communication with the Routing Engine.
Software triggered reboot of the TFEB is not necessarily an indication of a hardware failure. It is in most cases, a self-recovery mechanism. When the device is being overwhelmed with incoming traffic, the DDoS protection violation logs indicate exceeded bandwidth on the FPCs. When such CPU interrupts occur at a certain level, it can cause TFEB to crash due to PANIC.
Apr 16 14:20:40 jddosd[2149]: DDOS_PROTOCOL_VIOLATION_SET: Warning: Host-bound traffic for protocol/exception resolve:ucast-v4 exceeded its allowed bandwidth at fpc 0 for 3 times, started at 2019-04-16 16:10:39 MDT
Apr 16 14:25:53 rcv: ch_ipc_dispatch() null ipc read for args 0x2342020 pipe 0x23412c0, fru TFEB 0
Apr 16 14:25:53 -- TFEB 0, last request 0, state Online
Apr 16 14:29:53 disconnect: slot 0
Apr 16 14:30:17 ppmd[1944]: PPMD: Connection Shutdown/Closed with PFE: tfeb0
Apr 16 14:30:44 chassisd[1933]: CHASSISD_SHUTDOWN_NOTICE: Shutdown reason: TFEB connection lost <-- Loss of keepalives between RE and TFEB
Apr 16 14:30:53 CHASSISD_IFDEV_DETACH_FPC: ifdev_detach_fpc(0)
Apr 16 14:31:22 tfeb0 CLKSYNC: failed to connect to Master
Apr 16 14:31:23 tfeb0 L2ALM: failed to connect to Master
The behavior detected on the router is due to a PANIC event triggered by an overwhelmed TFEB card.
As shown below, the router CPU consumption records a high percentage:
Routing Engine status:
Temperature 43 degrees C / 109 degrees F
CPU temperature 52 degrees C / 125 degrees F
DRAM 2048 MB (2048 MB installed)
Memory utilization 53 percent
5 sec CPU utilization:
User 33 percent
Background 0 percent
Kernel 61 percent
Interrupt 1 percent
Idle 5 percent <-- CPU utilization is at 95%
In case the TFEB fails to come back online automatically by the software, restart the TFEB from the CLI using the below command:
lab@router>request chassis tfeb restart
Then, verify the TFEB status:
lab@router# run show chassis tfeb
TFEB status:
Slot 0 information:
State Online
Intake temperature 40 degrees C / 104 degrees F
Exhaust temperature 56 degrees C / 132 degrees F
CPU utilization 12 percent
Interrupt utilization 0 percent
Heap utilization 27 percent
Buffer utilization 13 percent
Total CPU DRAM 1024 MB
Start time: 2019-04-17 12:15:44 UTC
Uptime: 15 minutes, 43 seconds
In case the logs are flooded with DDoS violation messages even after the CLI restart and CPU utilization remains all-time high, we might need to identify the source of traffic in order to prevent further abnormalities. For identifying the source of the traffic, configure the flow monitoring to identify which IP address is sending the excess data causing the DDOS violation resulting in the traffic impact.
Refer to the technical documentation on
Configuring How Flow Detection Operates for Individual Protocol Groups or Packets
Example configuration based on the log message posted in this document:
>set system ddos-protection protocols resolve ucast-v4 flow-detection-mode automatic