Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

CFEB crash/reboot on M10i

0

0

Article ID: KB35828 KB Last Updated: 09 Apr 2021Version: 2.0
Summary:

The M10i Multiservice Edge Router houses either a Compact Forwarding Engine Board (CFEB) or an Enhanced Compact Forwarding Engine Board (CFEB-E), which is located on the rear of the router above the power supplies. The Compact Forwarding Engine Board (CFEB) or Enhanced Compact Forwarding Engine Board (CFEB-E) performs route lookup, management of shared memory, filtering, and switching on incoming data packets, then directs outbound packets to the appropriate FPC for transmission to the network. It can process 15 million packets per second (Mpps). The article explains how to overcome CFEB crash/ reboot on M10i devices.

Symptoms:

Flapping CFEB status or CFEB offline.

user@host> show chassis cfeb
CFEB status:
  State              Offline 
Cause:
  1. Traffic congestion on RE
  2. Hardware issue with CFEB
  3. No preventive measures taken towards rate-limiting the traffic
Solution:

The CFEB is a control board that communicates with the Routing Engine and transfers routing table data from the Routing Engine to the Forwarding Table in the Application-Specific Integrated Circuit (ASIC). The crash/restart of CFEB in the chassis is a consequence of either a high CPU event or a hardware failure. 

Software triggered reboot of the CFEB is not necessarily an indication of a hardware failure. It is in most cases, a self-recovery mechanism. When the device is being overwhelmed with incoming traffic, there is a high possibility that the RE gets exhausted. When such CPU interrupts occur at a certain level, it can cause CFEB to crash due to PANIC. In the case of hardware issues on CFEB, there could be a transient fluctuation in the status of the CFEB component or permanent hardware damage. Any which ways, it is best advised to maintain redundancy of CFEBs for such catastrophic occurrences.

Restart the CFEB (perform this action only if CFEB is in Offline state):

user@host> request chassis cfeb restart

Verify the CFEB CPU and RE CPU utilization:

user@host> show chassis cfeb
CFEB status:
Slot 0 information:
  State                              Master   
  Intake temperature                 35 degrees C / 95 degrees F
  Exhaust temperature                43 degrees C / 109 degrees F
  CPU utilization                    93 percent  <-- HIGH CPU utilization at CFEB
  Interrupt utilization               0 percent
  Heap utilization                   15 percent
  Buffer utilization                 22 percent
  Total CPU DRAM                     128 MB
  Internet Processor II              Version 1, Foundry IBM, Part number 164
  Start time:                        2020-16-05 03:24:15 PST
  Uptime:                            2 hours, 56 minutes, 18 seconds
user@host>show chassis routing-engine
Routing Engine status:
Temperature                 43 degrees C / 109 degrees F
CPU temperature             52 degrees C / 125 degrees F
DRAM                        2048 MB (2048 MB installed)
Memory utilization          53 percent
5 sec CPU utilization:
User                        36 percent
Background                   0 percent
Kernel                      61 percent
Interrupt                    1 percent
Idle                         2 percent   <-- HIGH CPU utilization at RE     
 
user@host> show pfe statistics traffic   <-- Run this multiple times and note down the increment in drops
...
    Software input medium drops         :             26371432
    Software output drops               :               400769
    Hardware input drops                :            183799250

   
High CPU utilization is seen because of excessive host bound traffic that is punted into the control plane. The RE CPU will eventually increase and get stuck at 100%. The communication between CFEB and RE dies due to loss of keepalives, thus CFEB restarts.

In order to avoid high CPU in the control-plane, we need to configure the lo0 filter. This is suggested on all Junos platforms. This is done in order to avoid unwanted host bound traffic hitting the control plane.  When you create an additional loopback interface, it is important to apply a filter to it so the Routing Engine is protected. We recommend that when you apply a filter to the loopback interface, you include the apply-groups statement. Doing so ensures that the filter is automatically inherited on every loopback interface, including lo0 and other loopback interfaces. Refer to TN226 - Using loopback filter to protect M, T, MX routers' routing-engine from DoS attack

Example Configuration:

lo0 {
unit 0 {
family inet {
filter {
input protect-RE;
}
address xxx.xx.x.x/32;
}
}
}


In case the logs are still flooded with DDoS violation messages and CPU utilization remains an all-time high, we might need to identify the source of traffic in order to prevent further abnormalities. For identifying the source of the traffic, configure the flow monitoring to identify which IP address is sending the excess data causing the traffic impact.

For flow detection, refer to Configuring How Flow Detection Operates for Individual Protocol Groups or Packets

If the CPU is normal and the CFEB is still crashing, it is possible the issue is with the hardware component. Perform CFEB restart from CLI using  request chassis cfeb restart.
If there are continuous flaps observed even after a soft restart, perform a hard reset of CFEB. If the issue still persists, this could be a possible hardware failure. Please contact your JTAC Representative.

Logs to be collected in issue state for JTAC analysis:

user@host>show chassis routing-engine
user@host>show system processes extensive
user@host>show pfe statistics traffic
user@host>monitor traffic interface lo0.0 no-resolve    <--
For a few minutes

user@host> start shell pfe network ?
Possible completions:
  cfeb0                Connect to Compact Forwarding Engine Board 0

Once in the shell, please run the commands below and provide outputs:

# show syslog messages   
# show nvram


Note: To ensure high availability, it is better to have 2 CFEBs installed in case of the device running critical services. There would be 2 CFEBs ( CFEB 0 and CFEB 1) in the device where each will be taking either Primary or Backup role.

user@host> show chassis cfeb
CFEB status:
Slot 0 information:
  State                              Master    
  Intake temperature                 35 degrees C / 95 degrees F
  Exhaust temperature                43 degrees C / 109 degrees F
  CPU utilization                     3 percent
  Interrupt utilization               0 percent
  Heap utilization                   10 percent
  Buffer utilization                 22 percent
  Total CPU DRAM                    128 MB
  Internet Processor II                 Version 1, Foundry IBM, Part number 164
  Start time:                           2020-16-16 03:24:15 PST
  Uptime:                            12 hours, 56 minutes, 18 seconds
Slot 1 information:
  State                                 Backup 
Modification History:
2021-04-09: Updated the article terminology to align with Juniper's Inclusion & Diversity initiatives

Related Links

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search