MPC6E reports XR2 JGCI Major CRC error with error code 0x25001
. The XR2 CRC errors indicate marginal hardware, typically due to a board problem on the MPC6E, but possibly an XR2 memory issue. This is not a software issue.
A few reported CRC errors are not disruptive. If the count does not increase, then it is ok. However, if a bursty CRC error event occurs, then the XR2 links need to be retrained by rebooting the MPC6E board, and a Major alarm is raised. This is a Major alarm from the view point of CM ERROR INFRA, and a Fatal error from the view point of ASIC.
JTAC recommends to manually reboot the board. If CRC errors appear again, then consider replacing the MPC6E board. We can trigger auto FPC restart by setting a config knob (interasic-linkerror-recovery-enable) to clear the error condition immediately.
MPC6E Cm error reports fatal CRC errors with 'XR2 Error code: 0x250001'
, but the MPC6E does not reboot.
*** messages ***
Oct 22 09:29:15 MX <163>Oct 22 09:29:15 MX fpc0 jgci_intf_log_interrupt_status: interface XR2CHIP(1)-intf-1 chan0 crc_count current 15
accumulated 15
Oct 22 09:29:15 MX fpc0 jgci_intf_log_interrupt_status: interface XR2CHIP(1)-intf-1 chan0 crc_count current 15 accumulated 15
Oct 22 09:29:15 MX fpc0 JGCI[XR2CHIP(1)-intf-1] JGCI_INT_REG_LKR_1_FORCED_RETRAIN seen
Oct 22 09:29:15 MX fpc0 JGCI[XR2CHIP(1)-intf-1] JGCI_INT_REG_LKR_0_FORCED_RETRAIN seen
Oct 22 09:29:15 MX tnp.tftpd[4558]: TFTPD_CONNECT_INFO: TFTP read from address 16 port 8 file /var/tmp/pfe_debug_commands
Oct 22 09:29:15 MX tnp.tftpd[4558]: TFTPD_SENDCOMPLETE_INFO: Sent 0 blocks of 1024 and 1 block of 42 for file '/var/tmp/pfe_debug_commands'
Oct 22 09:29:15 MX fpc0 Cmerror: Draining ASIC error message queue
Oct 22 09:29:15 MX fpc0 cmerror_process_queue: module = XR2CHIP(1)
Oct 22 09:29:16 MX fpc0 Cmerror: processing the task op_type 1 for level 1 level_count 2 occur_count 2 clear_count 0 level_threshold 1
level_action 0x6 item errid 2424833 item_threshold 1 item_count 0 sub_item errid 0 sub_item_state 0 item_timestamp 0 current times
Oct 22 09:29:16 MX fpc0 Cmerror: Level 1 count increment 3 occur_count 3 clear_count 0
Oct 22 09:29:16 MX fpc0 Error (0x250001), module: XR2CHIP(1), type: XR2 JGCI Major CRC error
Oct 22 09:29:16 MX fpc0 Cmerror: Level 1 count 3 (occur_count 3 clear_count 0)crossed threshold 1 action 0x6
Oct 22 09:29:16 MX fpc0 cmerror_take_action_helper: performing action 2 for level 1 err_id 0x250001
Oct 22 09:29:16 MX tnp.tftpd[4560]: TFTPD_CONNECT_INFO: TFTP write from address 16 port 9 file /var/tmp/pfe_debug_info_RMPC0
Oct 22 09:29:16 MX fpc0 cmerror_take_action_helper: performing action 4 for level 1 err_id 0x250001
Oct 22 09:29:16 MX fpc0 Cmerror Op Set: XR2CHIP(1): CRC Errors:XR2CHIP(1):1 on jgci rx channel id 2
Oct 22 09:29:17 MX fpc0 cmerror_process_queue: module = XR2CHIP(1)
Oct 22 09:29:17 MX fpc0 Cmerror: processing the task op_type 1 for level 1 level_count 3 occur_count 3 clear_count 0 level_threshold 1
level_action 0x6 item errid 2424833 item_threshold 1 item_count 1 sub_item errid 0 sub_item_state 0 item_timestamp -313367889 curr
Oct 22 09:29:17 MX fpc0 Cmerror: Level 1 count increment 4 occur_count 4 clear_count 0
Oct 22 09:29:17 MX fpc0 Error (0x250001), module: XR2CHIP(1), type: XR2 JGCI Major CRC error <-- Major alarm on CM error infra
Oct 22 09:29:17 MX fpc0 Cmerror: Level 1 count 4 (occur_count 4 clear_count 0)crossed threshold 1 action 0x6
Oct 22 09:29:17 MX fpc0 cmerror_take_action_helper: performing action 2 for level 1 err_id 0x250001
Oct 22 09:29:17 MX tnp.tftpd[4562]: TFTPD_CONNECT_INFO: TFTP read from address 16 port 10 file /var/tmp/pfe_debug_commands
Oct 22 09:29:17 MX tnp.tftpd[4562]: TFTPD_SENDCOMPLETE_INFO: Sent 0 blocks of 1024 and 1 block of 42 for file
'/var/tmp/pfe_debug_commands'
Oct 22 09:29:17 MX tnp.tftpd[4560]: TFTPD_RECVCOMPLETE_INFO: Received 67 blocks of 1024 size for file '/var/tmp/pfe_debug_info_RMPC0.2'
Oct 22 09:29:18 MX tnp.tftpd[4564]: TFTPD_CONNECT_INFO: TFTP write from address 16 port 11 file /var/tmp/pfe_debug_info_RMPC0
Oct 22 09:29:18 MX fpc0 cmerror_take_action_helper: performing action 4 for level 1 err_id 0x250001
Oct 22 09:29:18 MX <163>Oct 22 09:29:18 MX fpc0 Cmerror Op Set: XR2CHIP(1): Fatal Errors:XR2CHIP(1):1 on jgci rx channel id 3 <--
Fatal asic error
Oct 22 09:29:18 MX fpc0 Cmerror Op Set: XR2CHIP(1): Fatal Errors:XR2CHIP(1):1 on jgci rx channel id 3
In order to trigger auto FPC reboot, we need a config knob (interasic-linkerror-recovery-enable). It is recommended to configure it on MX routers to clear the error immediately.
Pio Poking
RMPC6(mx2020-re0 vty)# test tpio poke 1 long 0x02303e88 0x12
0x0002303e88: 0x12
RMPC6(mx2020-re0 vty)#
[Oct 24 09:49:53.536 LOG: Err] JGCI[XLCHIP(42)-intf-0] JGCI_INT_REG_LKR_0_FORCED_RETRAIN seen
[Oct 24 09:49:53.536 LOG: Err] Fatal JGCI error....FPC will restart to recover.... <-- FPC auto reboot
[Oct 24 09:49:53.536 LOG: Debug] Cmerror: Draining ASIC error message queue
[Oct 24 09:49:53.536 LOG: Debug] cmerror_process_queue: module = XL[0:0]
[Oct 24 09:49:53.536 LOG: Debug] Cmerror: processing the task op_type 1 for level 2 level_count 0 occur_count 0 clear_count 0
level_threshold 1 level_action 0x20 item errid 262232 item_threshold 1 item_count 0 sub_item errid 0 sub_item_state 0 item_timestamp 0 current times
[Oct 24 09:49:53.536 LOG: Debug] Cmerror: Level 2 count increment 1 occur_count 1 clear_count 0
[Oct 24 09:49:53.536 LOG: Info] Error (0x40058), module: XL[0:0], type: JGCI Fatal Errors
[Oct 24 09:49:53.536 LOG: Debug] Cmerror: Level 2 count 1 (occur_count 1 clear_count 0)crossed threshold 1 action 0x20
[Oct 24 09:49:53.536 LOG: Debug] cmerror_take_action_helper: performing action 20 for level 2 err_id 0x40058
lab@mx2020-re0> show chassis alarms
9 alarms currently active
Alarm time Class Description
2018-10-24 18:47:03 JST Major FPC 6 Major Errors <-- major alarm is raised.
Enabling Knob
[edit]
lab@mx2020-re0# set chassis fpc 6 interasic-linkerror-recovery-enable
[edit]
lab@mx2020-re0# commit
re0:
configuration check succeeds
re1:
commit complete
re0:
commit complete
[edit]
lab@mx2020-re0#
Test Again
RMPC6(mx2020-re0 vty)# test tpio poke 1 long 0x02303e88 0x12
0x0002303e88: 0x12
RMPC6(mx2020-re0 vty)#
[Oct 24 09:49:53.536 LOG: Err] JGCI[XLCHIP(42)-intf-0] JGCI_INT_REG_LKR_0_FORCED_RETRAIN seen
[Oct 24 09:49:53.536 LOG: Err] Fatal JGCI error....FPC will restart to recover.... <-- FPC auto reboot
[Oct 24 09:49:53.536 LOG: Debug] Cmerror: Draining ASIC error message queue
[Oct 24 09:49:53.536 LOG: Debug] cmerror_process_queue: module = XL[0:0]
[Oct 24 09:49:53.536 LOG: Debug] Cmerror: processing the task op_type 1 for level 2 level_count 0 occur_count 0 clear_count 0
level_threshold 1 level_action 0x20 item errid 262232 item_threshold 1 item_count 0 sub_item errid 0 sub_item_state 0 item_timestamp 0
current times
[Oct 24 09:49:53.536 LOG: Debug] Cmerror: Level 2 count increment 1 occur_count 1 clear_count 0
[Oct 24 09:49:53.536 LOG: Info] Error (0x40058), module: XL[0:0], type: JGCI Fatal Errors
[Oct 24 09:49:53.536 LOG: Debug] Cmerror: Level 2 count 1 (occur_count 1 clear_count 0)crossed threshold 1 action 0x20
[Oct 24 09:49:53.536 LOG: Debug] cmerror_take_action_helper: performing action 20 for level 2 err_id 0x40058
[edit]
lab@mx2020-re0# run show chassis fpc
Temp CPU Utilization (%) CPU Utilization (%) Memory Utilization (%)
Slot State (C) Total Interrupt 1min 5min 15min DRAM (MB) Heap Buffer
0 Empty
1 Empty
2 Empty
3 Empty
4 Empty
5 Empty
6 Present Testing
7 Empty
8 Empty
9 Empty
10 Empty
11 Empty
12 Empty
13 Empty
14 Empty
15 Empty
16 Empty
17 Empty
18 Empty
19 Empty