Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[PTX] PFE is disabled due to auto correctable non-fatal hardware error

0

0

Article ID: KB36521 KB Last Updated: 17 Apr 2021Version: 1.0
Summary:
On PTX 10K platforms, non-fatal hardware errors are reported as FATAL. As a result, an error reporting PFE is seen.
Symptoms:
  1. The following 'FATAL' error message for a PE chip on a FPC is repeatedly being set and cleared in the /var/log/messages file:

    Feb  3 06:38:08.440  labbox-re0 fpc1 Cmerror Op Set: PE Chip::FATAL ERROR!! from PE5[5]: RT:correct1_protect_intr: 0x00000100: Fatal if it is detected LLMEM Error MEM:llmem, MEMTYPE: 1,
    Feb  3 06:38:08.466  labbox-re0 fpc1 ISSUING ifd-down for et-1/0/21:0 (290) on pfe-5
    Feb  3 06:38:08.467  labbox-re0 fpc1 ISSUING ifd-down for et-1/0/21:1 (291) on pfe-5
    Feb  3 06:38:08.467  labbox-re0 fpc1 ISSUING ifd-down for et-1/0/21:2 (292) on pfe-5
    Feb  3 06:38:08.468  labbox-re0 fpc1 ISSUING ifd-down for et-1/0/21:3 (293) on pfe-5
    Feb  3 06:38:08.469  labbox-re0 fpc1 ISSUING ifd-down for et-1/0/23:0 (298) on pfe-5
    Feb  3 06:38:08.469  labbox-re0 fpc1 ISSUING ifd-down for et-1/0/23:1 (299) on pfe-5
    Feb  3 06:38:08.470  labbox-re0 fpc1 ISSUING ifd-down for et-1/0/23:2 (300) on pfe-5
    Feb  3 06:38:08.470  labbox-re0 fpc1 ISSUING ifd-down for et-1/0/23:3 (301) on pfe-5
    Feb  3 06:38:08.471  labbox-re0 fpc1 ISSUING ifd-down for et-1/0/25:0 (306) on pfe-5
    Feb  3 06:38:08.471  labbox-re0 fpc1 ISSUING ifd-down for et-1/0/25:1 (307) on pfe-5
    Feb  3 06:38:08.472  labbox-re0 fpc1 ISSUING ifd-down for et-1/0/25:2 (308) on pfe-5
    Feb  3 06:38:08.472  labbox-re0 fpc1 ISSUING ifd-down for et-1/0/25:3 (309) on pfe-5
    Feb  3 06:38:08.473  labbox-re0 fpc1 ISSUING ifd-down for et-1/0/29:0 (314) on pfe-5
    Feb  3 06:38:08.852  labbox-re0 fpc1 ISSUING ifd-down for et-1/0/29:1 (315) on pfe-5
    Feb  3 06:38:09.485  labbox-re0 fpc1 ISSUING ifd-down for et-1/0/29:2 (316) on pfe-5
    Feb  3 06:38:10.014  labbox-re0 fpc1 ISSUING ifd-down for et-1/0/29:3 (317) on pfe-5
    Feb  3 06:38:10.586  labbox-re0 fpc1 pechip_set_pfe_disabled:976: PECHIP[5]: PFE Marked Disabled !!!
    Feb  3 06:38:10.586  labbox-re0 fpc1 pechip_set_pfe_disabled:983: PECHIP[5]: Deregisterd TOE PIO Vectors !!!
    Feb  3 06:38:10.586  labbox-re0 fpc1 JPRDS_FDB:ERR:jprds_fdb_set_pfe_disabled(),7317: pfe 5 marked disabled
    Feb  3 06:39:29.381  labbox-re0 fpc1 Cmerror Op Clear: PE Chip::FATAL ERROR!! from PE5[5]: RT: Clear Fatal if it is detected LLMEM Error MEM:llmem, MEMTYPE: 1,
  2. When executing the vty command, 'show cmerror module' on the affected FPC, the 'Name' field is shown as 'PECHIP_CMERROR_RT_CORRECT1_FSET_REG_CORRECTED_LLMEM_F', the 'Level' field is shown as 'Fatal' and the 'Occurred' and 'Cleared' fields are shown as increasing numbers:

    user@device> request pfe execute command "show cmerror module" target fpc1 | no-more 
    SENT: Ukern command: show cmerror module
     
    Module-id  Name   Error-id     PFE  Level  Threshold  Count  Occurred  Cleared  Last-occurred(ms ago)  Name
       10  PE Chip
                      0x2100c2      2   Fatal      1        0     
     1923      1923     478029              PECHIP_CMERROR_RT_CORRECT1_FSET_REG_CORRECTED_LLMEM_F
  3. The affected PE chip is disabled and the interfaces associated with the PE chip are down. And the following logs could be seen.

    fpc1 pechip_set_pfe_disabled:976: PECHIP[2]: PFE Marked Disabled !!!     <<<<< PE chip is Disabled.
Cause:

This non-fatal auto correcting error is marked as fatal, which causes the FPC to shut down the affected PFE to ensure that the other PFEs are not affected.

Solution:

Restart the affected FPC to re-enable the PFE and bring the interfaces back up.

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search