Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

Syslog message: MQSS.*FI.*Child drop error

0

0

Article ID: KB32342 KB Last Updated: 24 Feb 2020Version: 4.0
Summary:

The "Child drop error" message reports cell is dropped during allocation in the fabric input reorder engine.


This is a Troubleshooting Article for a PFE ASIC Syslog Event.
To view other documented syslog events related to XMCHIP, XLCHIP, MQCHIP, LUCHIP, EACHIP, and PECHIP, see KB31893 - Master Index of Articles for Troubleshooting PFE ASIC Syslog Events.

.
Symptoms:

When a "Child drop error" event occurs, a messages similar to the following is reported and its often a combination:

Aug 21 21:56:28.423 2017 router fpc14 MQSS(1): FI: Cell underflow at the state stage (Cell behind reorder window) - Stream 60, Count 18634
Aug 21 21:56:28.423 2017 router fpc14 MQSS(1): FI: Reorder cell timeout - Stream 60, Count 12327
Aug 21 21:56:28.423 2017 router fpc14 MQSS(1): FI: Cell jump drop error - Stream 60, Count 69
Aug 21 21:56:28.423 2017 router fpc14 MQSS(1): FI: Child drop error - Stream 60, Count 290
Aug 21 21:56:30.424 2017 router fpc14 MQSS(1): FI: Cell underflow at the state stage (Cell behind reorder window) - Stream 60, Count 19353
Aug 21 21:56:30.424 2017 router fpc14 MQSS(1): FI: Reorder cell timeout - Stream 60, Count 11206
Aug 21 21:56:30.424 2017 router fpc14 MQSS(1): FI: Cell jump drop error - Stream 60, Count 73
Aug 21 21:56:30.424 2017 router fpc14 MQSS(1): FI: Child drop error - Stream 60, Count 255

 

Indications:

  • This event never comes alone and Cell underflow or Cell timeout will raise Minor Alarm if the fabric input error rate is reaching the threshold of 100 per second 3 times. The PFE reporting the FI errors is usually not the fault location. Child drop error alone does therefor not trigger an Alarm.

  • ​Junos 17.3R3 or higher has  implemented a syslog severity heuristic to determine between spurious versus persistent error events. With a threshold (50) and a
    time period (600 seconds) to decide the log severity level.  If the error count is below the threshold, it uses LOG_WARNING as the severity level.
    If the error count is above the threshold, it uses LOG_ERR as the severity level.
  • The volume of packets being dropped is related to the number of error cell counter reported.

  • There is an exposure of permanent impact on packet forwarding without the fix of PR1276301

 

Cause:

This is a signature of severe fabric cell out-of-orderness and may be caused by the following reasons:

  • Many of cells are dropped due to CRC errors from the ingress to the egress PFE due to hardware defect

  • Transient or sustained oversubscription of the egress PFE like MX2K Platforms with SFB

  • High transient oversubscription of fabric traffic flow with MPC9E and SFB2 which is solved in PR1304801

Solution:



Perform these steps to determine the cause and resolve the problem (if any).  Continue through each step until the problem is resolved.

  1. Collect the show command output.

    Capture the output to a file (in case you have to open a technical support case). To do this, configure each SSH client/terminal emulator to log your session.

    show log messages
    show log chassisd
    start shell network pfe <fpc#>
    show nvram
    show syslog messages
    exit


  2. Analyze the show command output.

    In the 'show log messages', review the events that occurred at or just before the appearance of the "Child drop error" message. Frequently these events help identify the cause.

    • Examine the possible triggers highlighted in the cause section

    • Contact your technical support representative for more analysis.

 

This article is indexed in KB31893 - Master Index of Articles for Troubleshooting PFE ASIC Syslog Events; tag EACHIPTSG


Tip: When looking at an event in the logs, it is important to focus on the first error message in a collection of syslog messages. The first error message is usually the cause of all the follow-on error messages. The follow-on collateral damage error messages can be ignored.

 

Modification History:


Note: KB Team - All changes to this article must be approved by the AI-Scripts Review team (pvs-scripts-review@juniper.net) before re-publishing.


AI-Scripts history:       (Updated by AI-Scripts team only)
 
Date KB Article Version AI-Scripts PR (optional) Notes
1/16/2018 1.0 1332636 New Syslog KB
       

2020-02-24: modified indication content with syslog heuristic

 
Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search