Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[EX] Understanding the soft error recovery feature on PFE

0

0

Article ID: KB34383 KB Last Updated: 20 Jul 2019Version: 1.0
Summary:

This article describes the soft error recovery feature on EX4300 device.

Symptoms:

EX4300 may experience packet drop silently without any alarm or log.

Troubleshooting further reveals incorrect PFE programming. This kind of PFE programming issue can occur from any register or memory entries, including the register/memory entries that programmed the correct value by control plane and the error was caused by memory/register bit flaps. This is the parity error. Such events are rare per device, but become more visible in large-scale deployments.

Cause:

The following reasons may cause a parity error:

  1. Emission of alpha particles from tiny amounts of radioactive materials present in the chips.
  2. Cosmic rays creating energetic neutrons and protons.
Solution:

The parity error causes non-permanent damage of the PFE internal memory and register, and it is correctable. 

Soft error recovery is the software feature used to restore the PFE internal memory and registers. Once enabled, the PFE can detect and try to recover the parity error by itself, and also print a log. The parity error will cause a slight packet drop. But once recovered, the PFE can continue forward traffic correctly. 

The feature was enabled on EX4300's PFE since 14.1X53-D51 release. There is no new Junos CLI to configure it. It can only be enabled by upgrading the software.

The following steps can be used to verify if the feature is enabled or not:

start shell
root@ex4300:0% cprod -A fpc0 -c 'set ex getconfig' | grep parity
parity_correction = 1  <-- 1 means parity correction enabled
parity_enable = 1      <-- 1 means parity check enabled

root@s08-9:RE:0% vty fpc0
BSD platform (QorIQ P202H processor, 0MB memory, 0KB flash)

(vty)# set exbcm bcmshell "memscan"
MemSCAN: Running on unit 0    <-- Running means the memscan is running, otherwise, the parity error from dynamic memories may not able to detect upon access it
MemSCAN:   Interval: 100000 usec
MemSCAN:   Rate: 64

The following logs are the parity error detection and recovery log. In case such log available, you can ignore it since it is already recovered by the PFE.

[Fri May 24 09:48:44 2019 LOG: Err] Unit: 0 
[Fri May 24 09:48:44 2019 LOG: Err] Mem: [Fri May 24 09:48:44 2019 LOG: Err] Parity error..
[Fri May 24 09:48:44 2019 LOG: Err] Error in: SBUS transaction.
[Fri May 24 09:48:44 2019 LOG: Err] Blk: 2, Address: 0x04400001, base: 0x10, stage: 1, index: 1
[Fri May 24 09:48:44 2019 LOG: Err] Unit 0: mem: 478=EGR_DVP_ATTRIBUTE_1 blkoffset:4
[Fri May 24 09:48:44 2019 LOG: Err] Unit 0: CLEAR_RESTORE: EGR_DVP_ATTRIBUTE_1[478] blk: epipe0 index: 1 : [2][4400000]   //indicated the register error was cleared


[Fri May 24 09:37:53 2019 LOG: Err] Unit: 0 
[Fri May 24 09:37:53 2019 LOG: Err] Mem:
[Fri May 24 09:37:53 2019 LOG: Err] Parity error..
[Fri May 24 09:37:53 2019 LOG: Err] Error in: SOP cell.
[Fri May 24 09:37:53 2019 LOG: Err] Blk: 16, Address: 0x00001444, base: 0x0, stage: 0, index: 5188
[Fri May 24 09:37:53 2019 LOG: Err] Unit 0: mem: 3678=RAW_ENTRY_TABLE blkoffset:8
[Fri May 24 09:37:53 2019 LOG: Debug] STATUS: 0x00000083
[Fri May 24 09:37:53 2019 LOG: Debug] OPCODE: 0x1d000200
[Fri May 24 09:37:53 2019 LOG: Debug] START ADDR: 0x79c9cb60
[Fri May 24 09:37:53 2019 LOG: Debug] CUR ADDR: 0x1c001400
[Fri May 24 09:37:53 2019 LOG: Err] _soc_mem_array_sbusdma_read: L2_ENTRY_1.ism0 failed(ERR)
[Fri May 24 09:37:53 2019 LOG: Err] H/W received sbus nack with error bit set.
[Fri May 24 09:37:53 2019 LOG: Err] Unit: 0 
[Fri May 24 09:37:53 2019 LOG: Err] Multiple:
[Fri May 24 09:37:53 2019 LOG: Err] Mem:
[Fri May 24 09:37:53 2019 LOG: Err] Parity error..
[Fri May 24 09:37:53 2019 LOG: Err] Error in: SBUS transaction.
[Fri May 24 09:37:53 2019 LOG: Err] Blk: 16, Address: 0x1c001444, base: 0x0, stage: 7, index: 5188
[Fri May 24 09:37:53 2019 LOG: Err] Unit 0: mem: 2017=L2_ENTRY_1 blkoffset:8
[Fri May 24 09:37:53 2019 LOG: Err] Unit 0: CLEAR_RESTORE: L2_ENTRY_2[2018] blk: ism0 index: 2594 : [16][1c000000]    //indicated the memory error cleared

Related Links

Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search