Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[PTX] High CPU in RE-DUO-2600 RE due to process irq17: uhci1 uhci4*

0

0

Article ID: KB35279 KB Last Updated: 27 Mar 2021Version: 2.0
Summary:

This article explains why the system process may show high CPU utilization in RE-DUO-2600 RE on PTX 3k/5k due to process irq17: uhci1 uhci4*, and what the corrective actions are to resolve the issue.

Symptoms:

CPU utilization remains high due to an interrupt storm (irq17) coming from the RE/CB hardware.

User@PTX3000-re0> show chassis hardware

Hardware inventory:
Item             Version  Part number  Serial number     Description
Chassis                                JN1258164AJC      PTX3000
Midplane         REV 25   750-044645   ACMJ2769          Backplane
FPM              REV 07   760-044663   ACNF3954          Front Panel Display
PSM 0            REV 04   740-044980   1EDJ5310794       DC 12V Power Supply
PSM 1            REV 04   740-044980   1EDJ5310779       DC 12V Power Supply
PSM 2            REV 04   740-044980   1EDJ5310789       DC 12V Power Supply
PSM 3            REV 04   740-044980   1EDJ5310815       DC 12V Power Supply
Routing Engine 0 REV 12   740-026942   P737A-006537      RE-DUO-2600
  ad0    3807 MB  SMART CF             SPG2014121202330  Compact Flash
  ad1   59488 MB  VSFA18PI064G-EM      35187-203         Disk 1
Routing Engine 1 REV 12   740-026942   P737A-005826      RE-DUO-2600
  ad0    3807 MB  SMART CF             SPG2014081302009  Compact Flash
  ad1   57241 MB  SGC13T064-TS9KBC-EM  SO141015AS1569841 Disk 1
CB 0             REV 17   750-044656   ACDS7924          Control Board
CB 1             REV 17   750-044656   ACDR9985          Control Board

User@PTX3000-re0> show system processes extensive

last pid: 27215;  load averages:  1.17,  0.86,  0.81  up 916+23:25:36    11:05:06
166 processes: 3 running, 141 sleeping, 1 zombie, 21 waiting

Mem: 787M Active, 96M Inact, 356M Wired, 499M Cache, 214M Buf, 14G Free
Swap: 3327M Total, 3327M Free

  PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
   23 root        1 -84 -187     0K    16K RUN    108.0H 76.71% irq17: uhci1 uhci4*
   10 root        1 155   52     0K    16K RUN       ???  5.57% idle
 2116 root        2 -26  -26   149M   113M nanslp 1334.5  5.57% chassisd

User@PTX3000-re0> show chassis routing-engine

Routing Engine status:
  Slot 0:
    Current state                  Master
    Election priority              Master
    Temperature                 34 degrees C / 93 degrees F
    CPU temperature             56 degrees C / 132 degrees F
    DRAM                      16359 MB (16384 MB installed)
    Memory utilization          11 percent
    5 sec CPU utilization:
      User                       6 percent
      Background                 0 percent
      Kernel                    13 percent
      Interrupt                 78 percent <--------------------- High
      Idle                       3 percent
    1 min CPU utilization:
      User                       2 percent
      Background                 0 percent
      Kernel                     9 percent
      Interrupt                 77 percent <--------------------- High
      Idle                      12 percent
    5 min CPU utilization:
      User                       2 percent
      Background                 0 percent
      Kernel                     8 percent
      Interrupt                 77 percent <--------------------- High
      Idle                      13 percent
    15 min CPU utilization:
      User                       1 percent
      Background                 0 percent
      Kernel                     8 percent
      Interrupt                 77 percent <--------------------- High
      Idle                      14 percent
    Model                          RE-DUO-2600
    Serial ID                      P737A-006537
    Start time                     2017-03-06 11:40:00 JST
    Uptime                         916 days, 23 hours, 25 minutes, 12 seconds
    Last reboot reason             Router rebooted after a normal shutdown.
    Load averages:                 1 minute   5 minute  15 minute
                                       1.24       0.88       0.82
Routing Engine status:
  Slot 1:
    Current state                  Backup
    Election priority              Backup
    Temperature                 42 degrees C / 107 degrees F
    CPU temperature             63 degrees C / 145 degrees F
    DRAM                      16359 MB (16384 MB installed)
    Memory utilization          10 percent
    5 sec CPU utilization:
      User                       0 percent
      Background                 0 percent
      Kernel                     0 percent
      Interrupt                  0 percent
      Idle                     100 percent
    Model                          RE-DUO-2600
    Serial ID                      P737A-005826
    Start time                     2017-03-06 11:18:04 JST
    Uptime                         916 days, 23 hours, 46 minutes, 50 seconds
    Last reboot reason             Router rebooted after a normal shutdown.
    Load averages:                 1 minute   5 minute  15 minute
                                       0.00       0.00       0.00
Cause:

In this case, the problem is due to an interrupt storm on a failing RE/CB that is causing it to become overloaded or saturated.

From the show system boot-messages command output, irq 17 is seen to be shared by multiple devices (PCI bridge, USB controller, SATA controller, and SMB bus controller) that are connected via the PCI bus on the RE. Therefore, it is hard to identify the device that is actually sending these frequent interrupts.

$ grep "irq 17" RSI.log
pcib7: <MPTable PCI-PCI bridge> mem 0xdec00000-0xdec0ffff irq 17 at device 14.0 on pci8
pcib11: <PCI-PCI bridge> irq 17 at device 2.0 on pci11
pcib14: <PCI-PCI bridge> irq 17 at device 6.0 on pci11
pcib17: <PCI-PCI bridge> irq 17 at device 10.0 on pci11
pcib19: <PCI-PCI bridge> irq 17 at device 14.0 on pci11
uhci1: <UHCI (generic) USB controller> port 0x1840-0x185f irq 17 at device 26.1 on pci0
uhci4: <UHCI (generic) USB controller> port 0x18a0-0x18bf irq 17 at device 29.1 on pci0
atapci0: <Intel ICH9 SATA300 controller> port 0x1c50-0x1c57,0x1c44-0x1c47,0x1c48-0x1c4f,0x1c40-0x1c43,0x18e0-0x18ff mem 0xdeb01000-0xdeb017ff irq 17 at device 31.2 on pci0
ichsmb0: <Intel 82801I (ICH9) SMBus controller> port 0x1c00-0x1c1f mem 0xdeb02000-0xdeb020ff irq 17 at device 31.3 on pci0

However, the root cause for high CPU (irq17) is currently not known. It is still unclear if this issue is related to hardware or software.

If this issue is seen only on one specific RE/CB pair, and it cannot be replicated by using other RE/CB pairs, then it is most likely due to one defective RE (or CB), which needs to be RMAed. The RE-DUO-2600 Routing Engine mentioned in this article is sourced from a third party (OEM product) vendor, and it has been marked EOL/EOS since late 2008.

Solution:

The recommended course of action is to restart the RE immediately, which will result in a decrease in CPU usage. If high CPU conditions persist even after restarting the RE, we need to suspect an issue related to the hardware (RE/CB). The defective unit RE (or CB) should then be identified and RMAed.

Note: The high CPU issue can be seen both on the Primary and Backup REs. If the problem is seen on the Primary RE, the user NOC should restart the RE immediately to avoid unwanted issues such as protocol flap, which may be caused by RE CPU resource exhaustion.

Modification History:
03-27-21: Updated avoid words per I&D guidelines.
Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search