Support Support Downloads Knowledge Base Juniper Support Portal Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[MX] False CB 0/1 failure alarms triggered by excessive FPM related logs

0

0

Article ID: KB37358 KB Last Updated: 07 Sep 2021Version: 1.0
Summary:

Intermittent false CB 0/1 failure major alarms may occur, which are self-cleared, and can be seen in log messages, accompanied with excessive FPM related logs in log chassisd.

This article provides the steps for how to troubleshoot this issue.

Symptoms:
  1. On RE0 (primary RE) "CB 0 Failure" major alarms can be seen in log messages which are self-cleared immediately:
Jun  8 02:40:31.072 2021  router-re0 alarmd[10448]: %DAEMON-4: Alarm set: CB color=RED, class=CHASSIS, reason=CB 0 Failure  <--- Major alarm is set
Jun  8 02:40:31.072 2021  router-re0 craftd[7281]: %DAEMON-4:  Major alarm set, CB 0 Failure
Jun  8 02:40:32.082 2021  router-re0 alarmd[10448]: %DAEMON-4: Alarm cleared: CB color=RED, class=CHASSIS, reason=CB 0 Failure
Jun  8 02:40:32.082 2021  router-re0 craftd[7281]: %DAEMON-4: Major alarm cleared, CB 0 Failure  <--- Major alarm is cleared
  1. From log chassisd, there are too many FPM logs prompting, which means too many interrupts were raised due to interaction with FPM. If CHASSISD_FPM_STORM_STATUS alarm is shown, then "CB 0 Failure" alarm will be raised, then self-cleared immediately.
Jun  8 02:40:28  stormchaser_get_cps: last call time=1733058:391812   now time=1733095:569396  dt=37:177584
Jun  8 02:40:28  fpm_tiny_tcb_intr tcb_ints_pending 0x30210000
Jun  8 02:40:28  fpm_tiny_CH_PRS_CHG handling CH_PRS interrupt
Jun  8 02:40:28  fpm_tiny_tcb_intr re-enabling interrupts (0x00038000)
Jun  8 02:40:28  exit fpm_tiny_tcb_intr ...
Jun  8 02:40:28  stormchaser_get_cps: last call time=1733095:569396   now time=1733095:671171  dt=0:101775
Jun  8 02:40:28  fpm_tiny_tcb_intr tcb_ints_pending 0x30210000
Jun  8 02:40:28  fpm_tiny_CH_PRS_CHG handling CH_PRS interrupt
Jun  8 02:40:28  fpm_tiny_tcb_intr re-enabling interrupts (0x00038000)
Jun  8 02:40:28  exit fpm_tiny_tcb_intr ...
Jun  8 02:40:29  stormchaser_get_cps: last call time=1733095:569396   now time=1733096:261145  dt=0:691749
Jun  8 02:40:29  fpm_tiny_tcb_intr tcb_ints_pending 0x30210000
Jun  8 02:40:29  fpm_tiny_CH_PRS_CHG handling CH_PRS interrupt
Jun  8 02:40:29  fpm_tiny_tcb_intr re-enabling interrupts (0x00038000)
Jun  8 02:40:29  exit fpm_tiny_tcb_intr ...
Jun  8 02:40:29  stormchaser_get_cps: last call time=1733095:569396   now time=1733096:261366  dt=0:691970
Jun  8 02:40:31 CHASSISD_FPM_STORM_STATUS: fpm_tiny_tcb_storm_state_change: tcb storm active
Jun  8 02:40:31  send: red alarm set, device CB 0, reason CB 0 Failure <--- CB 0 Failure alarm is raised post CHASSISD_FPM_STORM_STATUS alarm
Jun  8 02:40:31  stormchaser_get_cps: last call time=1733095:569396   now time=1733098:214079  dt=2:644683
Jun  8 02:40:31  fpm_tiny_tcb_intr tcb_ints_pending 0x30210000
Jun  8 02:40:31  fpm_tiny_CH_PRS_CHG handling CH_PRS interrupt
Jun  8 02:40:31  fpm_tiny_tcb_intr re-enabling interrupts (0x00038000)
Jun  8 02:40:31  exit fpm_tiny_tcb_intr ...
Jun  8 02:40:32 CHASSISD_FPM_STORM_STATUS: fpm_tiny_tcb_storm_state_change: tcb storm active
Jun  8 02:40:32  send: red alarm clear, device CB 0, reason CB 0 Failure <--- CB 0 Failure alarm is cleared post a second CHASSISD_FPM_STORM_STATUS alarm
  1. On RE1 (backup RE), "CB 1 Failure" major alarms can also be seen accompanied with the same excessive FPM logs.
Solution:

This is due to excessive ghost interrupts between the FPM and RE/CB, which are mostly hardware related. Follow the steps below to narrow it down:

  1. Check for abnormalities regarding FPM:
  • To remove FPM temporarily, check if FPM logs stopped prompting. Then replace FPM to monitor whether the issue can be resolved.
  • Check if ribbon cable of FPM is fixed properly.
  1. Check if the electric ground of the node is fixed properly.
  2. Check for any other hardware alarms reported. Then replace the corresponding hardware to see if the issue can be resolved.
  3. Power cycle (cold reboot) the node during a maintenance window to see if ghost interrupts can be cleared.
  4. Replace the midplane.
Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search