Support Support Downloads Knowledge Base Juniper Support Portal Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

Syslog message: XMCHIP.*FI.*Cell underflow at the state stage

0

0

Article ID: KB31611 KB Last Updated: 07 Oct 2021Version: 7.0
Summary:

The "FI Cell underflow at the state stage" message reports the fabric cells are arriving late on the egress PFE.

Symptoms:

When a "FI Cell underflow at the state stage" event occurs, a message similar to the following is reported:
Sep 22 07:30:10 router0 : %PFE-3: fpc2 XMCHIP(1): FI: Cell underflow at the state stage - Stream 12, Count 65535

Indications:

  1. Service Impact: Persistent drop of packets.
  2. With the fix for PR1076299, a major alarm will be raised if the rate is higher than 100/second.
  3. Counter value in message presents the number of 64B cells dropped.
Cause:

The cause may be due to 

  • software defect in the fabric manager
  • fault at ingress FPC sending to an egress FPC reporting the error condition
  • could be fabric congestion because of some specific traffic pattern on the affected PFE/MPC.

In general, this error means the fabric cells are arriving late on the egress PFE. The actual problem could be on the ingress PFE sending corrupted packets to the egress PFE. The egress PFE reports the error. This could also be a fabric problem.

Enhancement added through PR1076299 facilitates to raise minor chassis alarm for this error if threshold is higher then 100 per second. Once this message is reported for the same stream continuously, it is most likely a fabric wedge.

If local transient error fault:
PR1264656 and PR1262868 will report major CM alarm upon fabric stream wedge and event “FI: Cell underflow errors with reorder engine pointers stalled”, or “FI: Link sanity check and high rate cell underflow errors” will get reported when stream wedge condition is declared.

 

Solution:

Perform these steps to determine the cause and resolve the problem (if any).  Continue through each step until the problem is resolved.

  1. Collect the show command output.

    Capture the output to a file (in case you have to open a technical support case). To do this, configure each SSH client/terminal emulator to log your session.

    MX platform:

    show log messages
    show log chassisd
    start shell network pfe <fpc#>
    show nvram
    show syslog messages
    exit

    SRX platform:

    Collect the following commands below first, then follow the instructions on KB21781: Data to Collect for all configurations if time allows:

    show log messages
    show log chassisd
    request pfe execute target tnp tnp-name [node#].<fpc#>.<pic#> command "show syslog messages"
    (MPCs only): request pfe execute target tnp-name [node#].<fpc#> command "show nvram"

  2. Analyze the show command output.

    In the 'show log messages', review the events that occurred at or just before the appearance of the "FI Cell underflow at the state stage" message. Frequently these events help identify the cause.

  • No RMA required.

  • To detect local transient error condition with the fix for PR1264656 and PR1262868, major alarm will be raised if stream wedge is found.

  • With the fix for PR1186421, major alarms on the MX will default to take the disable-pfe action. On the SRX cluster, a chassis cluster failover will be triggered.

  • The generic pfe-disable event script will detect a permanent impact on packet forwarding condition that happens as a result of local transient error condition and invokes the pfe-disable action. (does not apply for SRX Platform)  KB31867 - [Junos] Generic pfe-disable event script 

  • An FPC reboot at a later time is needed to bring the PFE back into service.

  • Contact your technical support representative if the issue is seen after a FPC restart. It could be triggered by true fabric congestion because of some specific traffic pattern on the affected PFE/MPC.

This article is indexed in KB31893 - Primary Index of Articles for Troubleshooting PFE ASIC Syslog Events; tag XMCHIPTSG

Modification History:

2020-07-30: Fixed broken link.
2017-09-11: Added SRX specific information and qualify non-SRX actions (such as disable-pfe script) with "(MX only) phrase.
2017-08-07: Added the pfe-disable event-script as a mitigation solution.

 

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search