Support Support Downloads Knowledge Base Apex Support Portal Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[PTX] Alarm "Major Errors - PE Error code: 0x210001"

0

0

Article ID: KB36523 KB Last Updated: 08 Mar 2021Version: 1.0
Summary:
 

The following alarm may be seen on PTX routers when an HMC error is detected:

Alarm time               Class  Description
2021-01-31 02:37:43 PST  Major  FPC 0 Major Errors - PE Error code: 0x210001

This article explains why the above alarm may persist after auto-correction of the HMC error and indicates that rebooting the line card will resolve the problem.

 

Symptoms:
 

When a hybrid memory cube (HMC) error is detected on any PFE, the following log messages are seen along with the above alarm on the router:

Jan 31 02:37:08.190  pr05.iad3-re0 fpc0 pechip_q_cm_per_voq_drop_stat_get:3827: PE[1]: CM VOQ Stat HMC read(hmc_addr = 27710000) failed 1000 (generic failure)
Jan 31 02:38:35.980  pr05.iad3-re0 fpc0 Cmerror Op Set: PE Chip::FATAL ERROR!! from PE1[1]: HMCIF: hmcif access control not finish read with timeout  
Jan 31 02:38:45.469  pr05.iad3-re0 fpc0 Cmerror Op Set: PE Chip::FATAL ERROR!! from PE1[1]: HMCIF: Link3: HMC Fatal Error cmd:62 lng:1 ltag:1 dinv:0 errstat:127 err_cnt:0x40000000 
Jan 31 02:38:57.349  pr05.iad3-re0 fpc0 Cmerror Op Set: PE Chip::FATAL ERROR!! from PE1[1]: HMCIF: Link4: HMC Fatal Error cmd:62 lng:1 ltag:1 dinv:0 errstat:127 err_cnt:0x40000000 
Jan 31 02:39:02.406  pr05.iad3-re0 fpc0 hmc_eri_config_access, HMC 9, read eri 10 timeout error
Jan 31 02:39:02.407  pr05.iad3-re0 fpc0   Dumping Micron HMC 9 ADDENDUM DUMP 60 entries ...
Jan 31 02:39:02.459  pr05.iad3-re0 fpc0   Dumping Micron HMC 9 DUBUG DUMP 384 entries ...
Jan 31 02:39:02.516  pr05.iad3-re0 fpc0   Dumping Micron HMC 9 FATAL ERR DUMP 3181 entries ...
Jan 31 02:39:03.189  pr05.iad3-re0 fpc0 cmsngfpc_hmc_temp_check, HMC 9 PFE1-HMC1-DIE run time info read error
Jan 31 02:39:03.193  pr05.iad3-re0 fpc0 Cmerror Op Set: Generic HMC::FATAL ERROR!! from HMC09-01-11: eri timeout error 

Jan 31 02:39:02.200  pr05.iad3-re0 fpc0 Cmerror Op Set: Host Loopback: HOST LOOPBACK WEDGE DETECTED IN PATH ID 2  
Jan 31 02:39:02.220  pr05.iad3-re0 fpc0 Cmerror Op Set: Host Loopback: HOST LOOPBACK WEDGE DETECTED IN PATH ID 3  

Jan 31 02:39:02.322  pr05.iad3-re0 fpc0 pechip_set_pfe_disabled:976: PECHIP[1]: PFE Marked Disabled !!!
Jan 31 02:39:02.335  pr05.iad3-re0 fpc0 pechip_set_pfe_disabled:976: PECHIP[0]: PFE Marked Disabled !!!
Jan 31 02:39:02.335  pr05.iad3-re0 fpc0 pechip_set_pfe_disabled:983: PECHIP[0]: Deregisterd TOE PIO Vectors !!!
Jan 31 02:39:02.336  pr05.iad3-re0 fpc0 JPRDS_FDB:ERR:jprds_fdb_set_pfe_disabled(),7317: pfe 0 marked disabled 

Even when the error is auto corrected, the alarm stays as the PFE gets disabled. It also detects the wedge as seen above.

user@router-re0> show chassis fpc errors
FPC  Level Occurred Cleared Threshold Action-Taken Action
0   Minor      0      0     10      0   LOG|
    Major    563    561      1   1493 GET STATE|ALARM|
    Fatal     11      0      1     22 DISABLE PFE
    Pfe-State: pfe-0 -DISABLED | pfe-1 -DISABLED | pfe-2 -ENABLED | pfe-3 -ENABLED | pfe-4 -ENABLED | pfe-5 -ENABLED |

To make sure that the error is cleared, collect the output below from the affected FPC and PFE to ensure that the errorstat value is set to 0x0.

Log in to the FPC by using the following command:

start shell pfe network fpc<fpc_slot>

Collect the following output to determine whether the error was cleared:

bringup jspec read pechip[x] register hmcif link 0 hmc_err_log_0
bringup jspec read pechip[x] register hmcif link 1 hmc_err_log_0
bringup jspec read pechip[x] register hmcif link 2 hmc_err_log_0
bringup jspec read pechip[x] register hmcif link 3 hmc_err_log_0
bringup jspec read pechip[x] register hmcif link 4 hmc_err_log_0
bringup jspec read pechip[x] register hmcif link 5 hmc_err_log_0

You may see that the error is cleared. However, the PFE is down due to the wedge being detected from the HMC error earlier.

 

Cause:
 

Even though the HMC error is an auto corrected transient hardware error, it may cause a wedge sometimes. This will result in a disabled PFE  and shut down the interface on the affected PFE to avoid the impact being spread across other PFEs.

 

Solution:
 

Reboot the Line Card to clear the alarm and bring up the disabled PFEs. It will also clear the HMC error if it is not auto-corrected.

 

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search