Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[BTI] System hangs; reports management loss due to rapid link flapping in BTI821

0

0

Article ID: KB32670 KB Last Updated: 29 May 2018Version: 1.0
Summary:

A mode mismatch between the SFP on BTI821 and the customer SFP is seen to result in the BTI821 network element (NE) losing management access to the network management system (NMS). Although a ping to the NE management IP works, the system fails at startup because the SNMP/SSH access is broken. This article provides a solution for this issue and gives the reason for the following:

  • System startup failure
  • Slow system response
  • Increased CPU resource utilization
  • Rapid link flapping on an interface

 

Symptoms:

The NMS Server (ProNX Service Manager) raises the BTI821 NE not reachable alarm when there is no response during its NE status poll.

When a login to the node is attempted via telnet/ssh, connection is refused:

[root@psm ~]# telnet 172.26.78.33
Trying 172.26.78.33...
Connected to 172.26.78.33.
Escape character is '^]'.
% System startup uncompleted. Please waiting a moment and try it again: Connection refused.

When the node is powered off and on, management access is restored but the system response is slow. When system performance is reviewed, CPU utilization is found to be continuously on the rise:

BTI-SA-821# show pm system current
------------------------------------------
      System CPU & Memory Utilization
------------------------------------------
PM Status  | Enable
------------------------------------------
            | Utilization  | Threshold
------------------------------------------
CPU        |           89% |          70%
MEM        |          16% |          80%
------------------------------------------
Start Time | Jan02 20:45
Status     | Invalid
------------------------------------------
Current Idx| 15Minutes#20
------------------------------------------
BTI-SA-821#

On checking the system logs, one of the GE interfaces is shown to demonstrate rapid link flapping per second:

BTI-SA-821# show syslog reverse
...
2018-03-13 21:04:14  MC103.TY9:eth-0-6 ENT IF  Link Up :eth-0-6
2018-03-13 21:04:14  MC103.TY9:eth-0-6 ENT IF  Link Down :eth-0-6
2018-03-13 21:04:14  MC103.TY9:eth-0-6 ENT IF  Link Up :eth-0-6
2018-03-13 21:04:14  MC103.TY9:eth-0-6 ENT IF  Link Down :eth-0-6
2018-03-13 21:04:14  MC103.TY9:eth-0-6 ENT IF  Link Up :eth-0-6
2018-03-13 21:04:14  MC103.TY9:eth-0-6 ENT IF  Link Down :eth-0-6
...

When the problematic interface is disabled, system CPU utilization and management response are seen to return to normal.

 

Cause:

The ETH-0-6 interface is a new service deployment on the BTI821 NE. On the BTI821 end, the SFP is a single-mode SFP, but at the customer end, the SFP may be multi-mode. The mismatch between the SFP modes is seen to cause the link flapping.

Typically, every link flap incident is reported to the CPU for interrupt processing and system CPU resources are increasingly used to process these interrupts. Eventually there is CPU resource saturation and there is no resource left to be assigned for other tasks, which results in a system hang.

 

Solution:

To prevent this kind of a problem, errdisable for link flapping must be enabled.

After errdisable is enabled, when link flapping is detected, errdisable first shuts down the corresponding interface to protect the system. After a specified interval of time (5 minutes by default), it brings up the interface again.

The knobs for enabling errdisable detect and errdisable recovery for link-flap are as follows:

  • errdisable detect reason link-flap
  • errdisable recovery reason link-flap

The recovery interval after which the interface is brought up again and the link flapping threshold can be adjusted with the following commands:

  • errdisable recovery interval <30-86400 seconds>
  • errdisable flap reason link-flap

 

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search