Support Support Downloads Knowledge Base Juniper Support Portal Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

Syslog message: EA.*HMCIF Rx: Link.: A response packet with a FATAL state is received from HMC - State: 0x1f

1

0

Article ID: KB32344 KB Last Updated: 11 Oct 2021Version: 4.0
Summary:

The "response packet with a FATAL state is received from HMC with State 0x1f" message reports a multi-bit uncorrectable error condition in the Hybrid Memory Cube.

This is a Troubleshooting Article for a PFE ASIC Syslog Event.
To view other documented syslog events related to XMCHIP, XLCHIP, MQCHIP, LUCHIP, EACHIP, and PECHIP, see KB31893 - Index of Articles for Troubleshooting PFE ASIC Syslog Events.

.

Symptoms:

When a "response packet with a FATAL state is received from HMC with State: 0x1f" event occurs, a message similar to the following is reported:

Nov 6 01:48:52 router chassisd[50920]: ASIC Error detected errorno 0x002400c1 Restart action performed
Nov 6 01:48:52 router chassisd[50920]: CHASSISD_FRU_OFFLINE_NOTICE: Taking FPC 8 offline: FPC reset by error manager
Nov 6 01:48:52 router fpc8 Cmerror Op Set: EA[2:0]::FATAL ERROR!! from EA[2:0]: HMCIF Rx: Link0: A response packet with a FATAL state is received from HMC - State: 0x1f, Count 2
Nov 6 01:48:52 router fpc8 Cmerror Op Set: EA[2:0]::FATAL ERROR!! from EA[2:0]: HMCIF Rx: Link1: A response packet with a FATAL state is received from HMC - State: 0x1f, Count 2

 

The messages log in Junos OS 17.3 or higher includes the register dump:

Nov 24 11:15:13.984 atlas-re0 : %PFE-3: fpc8 Dumping Micron HMC 62 FATAL ERR DUMP 3181 entries ...
Nov 24 11:15:13.989 atlas-re0 : %PFE-3: fpc8 Addr , Data Addr , Data
Nov 24 11:15:13.994 atlas-re0 : %PFE-3: fpc8 0x002c8001, 0x7634001d 0x002c8002, 0x762bd20e
Nov 24 11:15:13.998 atlas-re0 : %PFE-3: fpc8 0x002c8004, 0x0111222c 0x002c8003, 0x009b0000
Nov 24 11:15:14.004 atlas-re0 : %PFE-3: fpc8 0x002c8000, 0x00000002 0x00288000, 0x00000009
Nov 24 11:15:14.009 atlas-re0 : %PFE-3: fpc8 0x00288001, 0x00000009 0x00288002, 0xc6420100
<..>
Nov 24 11:15:14.014 atlas-re0 : %PFE-7: fpc8 Cmerror: Draining ASIC error message queue
Nov 24 11:15:14.014 atlas-re0 : %PFE-7: fpc8 cmerror_process_queue: module = EA[0:0]
Nov 24 11:15:14.015 atlas-re0 : %PFE-7: fpc8 Cmerror: processing the task op_type 1 for level 1 level_count 0 occur_count 0 clear_count 0 level_threshold 1 level_action 0x44 item errid 2359489 item_threshold 1 item_count 0 item_sub_err_state 0 sub_item errid 0 sub_item_state 0 item_time
Nov 24 11:15:14.015 atlas-re0 : %PFE-7: fpc8 Cmerror: Level 1 count increment 1 occur_count 1 clear_count 0
Nov 24 11:15:14.015 atlas-re0 : %PFE-6: fpc8 Error: /fpc/8/pfe/0/cm/0/EA[0:0]/0/EACHIP_CMERROR_HMCIF_RX_LINK_INT_REG_HMC_FATAL_ERR (0x2400c1), severity: major, module: EA[0:0], type: HMCIF RX link int reg HMC fatal err
Nov 24 11:15:14.015 atlas-re0 : %PFE-7: fpc8 Cmerror: Level 1 count 1 (occur_count 1 clear_count 0)crossed threshold 1 action 0x44
Nov 24 11:15:14.015 atlas-re0 : %PFE-7: fpc8 cmerror_take_action_helper: performing action 4 for level 1 err_id /fpc/8/pfe/0/cm/0/EA[0:0]/0/EACHIP_CMERROR_HMCIF_RX_LINK_INT_REG_HMC_FATAL_ERR (0x2400c1) module id 13
Nov 24 11:15:14.020 atlas-re0 : %PFE-7: fpc8 cmerror_take_action_helper: performing action 2 for level 1 err_id /fpc/8/pfe/0/cm/0/EA[0:0]/0/EACHIP_CMERROR_HMCIF_RX_LINK_INT_REG_HMC_FATAL_ERR (0x2400c1) module id 13
<..>
Nov 24 11:15:15.262 atlas-re0 chassisd[5135]: %DAEMON-3: FPC 8 sent action 0x44 that is not configured 0x2
Nov 24 11:15:15.262 atlas-re0 alarmd[5630]: %DAEMON-4: Alarm set: FPC color=RED, class=CHASSIS, reason=FPC 8 Major Errors - EA Error code: 0x2400c1
Nov 24 11:15:15.262 atlas-re0 craftd[5138]: %DAEMON-4: Major alarm set, FPC 8 Major Errors - EA Error code: 0x2400c1
<..> Nov 24 11:15:26.180 atlas-re0 : %PFE-7: fpc8 cmerror_take_action_helper: performing action 40 for level 1 err_id /fpc/8/pfe/0/cm/0/EA[0:0]/0/EACHIP_CMERROR_HMCIF_RX_LINK_INT_REG_HMC_FATAL_ERR (0x2400c1) module id 13
Nov 24 11:15:26.180 atlas-re0 : %PFE-5: fpc8 PFE 0: 'PFE Disable' action performed. Bringing down ifd et-8/0/2 224
Nov 24 11:15:26.231 atlas-re0 : %PFE-5: fpc8 PFE 0: 'PFE Disable' action performed. Bringing down ifd et-8/0/5 225
<..> Nov 24 11:15:26.528 atlas-re0 : %PFE-6: fpc8 PFE:pfe_set_fe_down_flag First active PFE ID changed from 0 to 1
Nov 24 11:15:26.529 atlas-re0 : %PFE-3: fpc8 Cmerror Op Set: EA[0:0]: EA[0:0]: HMCIF Rx: Link0: A response packet with a FATAL state is received from HMC - State: 0x1f, Count 2


 

Indications:

  • Permanent PFE forwarding impact and MPC is getting restarted automatically due to fatal severity error.
  • Upon fatal error, an HMC register dump will be reported in the message log in Junos OS 15.1F7, 16.1R4 or 16.2R2 or higher, to provide additional information if needed by your technical support representative.
  • In Junos OS 17.3R1 or higher, disable-pfe action will be invoked instead of reset-fpc.
  • Alarm will be raised.
Cause:

This indicates that the internal logic in the Hybrid Memory Cube has encountered a multi-bit uncorrectable error condition from which it cannot recover without a reset.

Solution:



Perform these steps to determine the cause and resolve the problem (if any).  Continue through each step until the problem is resolved.

  1. Collect the show command output.

    Capture the output to a file (in case you have to open a technical support case). To do this, configure each SSH client/terminal emulator to log your session.

    show log messages
    show log chassisd
    start shell network pfe <fpc#>
    show nvram
    show syslog messages
    exit

  2. Analyze the show command output.

    In the 'show log messages', review the events that occurred at or just before the appearance of the "response packet with a FATAL state is received from HMC with State 0x1f" message. Frequently these events help identify the cause.

    • RMA is only required if after the MPC restart the same error condition happen again
    • In Junos OS 17.3R1 or higher the default action will be disable-pfe instead of FPC restart which is changed within PR1186421
    • In Junos OS 17.3R3 and 17.4R2 or higher, memory built-in self-test (BIST) repair functionality has been enabled to reduce the exposure of multi-bit uncorrectable error conditions.
    • Contact your technical support representative immediately
 

This article is indexed in KB31893 - Primary Index of Articles for Troubleshooting PFE ASIC Syslog Events; tag EACHIPTSG


Tip: When looking at an event in the logs, it is important to focus on the first error message in a collection of syslog messages. The first error message is usually the cause of all the follow-on error messages. The follow-on collateral damage error messages can be ignored.

 

Modification History:
05/03/2019: Added BIST repair functionality added in the Solution section

Note: KB Team - All changes to this article must be approved by the AI-Scripts Review team (pvs-scripts-review@juniper.net) before re-publishing.


AI-Scripts history:       (Updated by AI-Scripts team only)
 
Date KB Article Version AI-Scripts PR (optional) Notes
1/18/2018 1.0 1334656 New Syslog KB
       

 
Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search