Support Support Downloads Knowledge Base Juniper Support Portal Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[PTX] Chassis shutdown due to high temperature on FPC

0

0

Article ID: KB36104 KB Last Updated: 28 Jul 2020Version: 1.0
Summary:
This article explains the behavior of the PTX3000 devices where there is a chassis shut down and when there is high temperature detected on the FPC. This behavior is not ideal. The entire device shuts down, thus taking down all services passing through and active due to the device. By the end of the article, the reader will be able to prevent entire chassis shutdown in cases where there are temperature issues seen on FPC.
 
Symptoms:

Active Alarms:

user@host> show chassis alarms
2 alarms currently active
Alarm time               Class  Description
2020-03-22 17:25:28      Minor  FPC 8 Temperature Warm
2020-03-22 17:24:51      Major  Fan Tray 1 Failure

Syslog Message:

Mar 22 17:24:51  send: red alarm set, device Fan Tray 1, reason Fan Tray 1 Failure
<snip>
Mar 22 17:25:28 CHASSISD_FRU_HIGH_TEMP_CONDITION: FPC 8 TL1 temperature 86 over 85 degrees C (fan/impeller failure detected)
Mar 22 17:25:33 CHASSISD_FRU_HIGH_TEMP_CONDITION: FPC 8 TL1 temperature 86 over 85 degrees C (fan/impeller failure detected)
Mar 22 17:25:38 CHASSISD_FRU_HIGH_TEMP_CONDITION: FPC 8 TL1 temperature 88 over 85 degrees C (fan/impeller failure detected)
<logs snipped for brevity>
Mar 22 17:26:18 CHASSISD_TEMP_HOT_NOTICE: FPC 8 temperature of 96 degrees C is above limit (95 degrees)
Mar 22 17:26:18  send: red alarm set, device FPC 8, reason FPC 8 Temperature Hot
Mar 22 17:26:18  send: yellow alarm clear, device FPC 8, reason FPC 8 Temperature Warm
Mar 22 17:26:18 CHASSISD_FRU_OVER_TEMP_CONDITION: FPC 8 TL0 temperature 96 over 95 degrees C (fan/impeller failure detected); FRU will shut down in 240 seconds if condition persists
Mar 22 17:26:23 CHASSISD_TEMP_HOT_NOTICE: FPC 8 temperature of 96 degrees C is above limit (95 degrees)
<logs snipped for brevity>
Mar 22 17:28:53 CHASSISD_OVER_TEMP_SHUTDOWN_TIME: Chassis temperature above 105 degrees C for too long  (> 240 seconds); powering down all FRUs

Cause:
  1. Device running on a single fan tray
  2. Transient issue with fan tray
  3. Permanent Fan Tray failure
  4. Environment Conditions not met around the device to main cooling
Solution:

PTX3000 has two fan trays. One at the top (top fan tray ) another at the bottom (bottom fan tray) both in 2 seperate zones. PTX3000 does not have fan-tray level redundancy in each cooling zone. So when all the fans on single fan-tray stop working, some FRUs on the failed cooling zone may reach FIRE TEMP situation and entire chassis shutdown would be triggered. This is an impactful behavior on a device in production.

To prevent this from happening, enable the knob "overtemp-quick-shutdown" in order to prevent impact on the device.

Command to configure:

user@host> set chassis overtemp-quick-shutdown <1-240>

<1-240> represents the seconds value

Once this knob is set, chassisd will try to keep FRUs under failing cooling zone working which will direct the fans on non-failing zone to spin at full speed thus providing cooling to the FRUs  on the other zone. This helps in shutting down only FPCs in the affected zone and the increased fan speed in the other zone helps in adequately cooling the SIBs and the remaining FPCs and so the chassis can continue to operate in half capacity. This helps is bring the temperature to a balance which in turn prevents the entire chassis from shutting down.

WARNING- The above hidden command is only meant to be executed in worst case scenarios in order to prevent entire chassis from shutting down. While it is recommended to consult JTAC before using it, usage of this command would be at customer risk if unconsulted as behavior of a device in different environmental conditions may vary.

NOTES:

  1. Since the command is hidden in nature, tab key and space key wont help in command completion. You will type the full common to prevent errors on CLI.
  2. The above command is available for usage from all releases starting Junos18.2 

 

In case the above command does not help in preventing the chassis shut-down, it is possible that there is a non-rectifiable issue with the temperature fluctuations on this device which is causing the device to shutdown. In that case, please collect the below logs and contact JTAC for analysis.

user@host> show chassis temperature-thresholds
user@host> show log messages
user@host> show log chassisd
user@host> show chassis environment

 

Related Links

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search