Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[EX] Troubleshooting and resolving high CPU utilization on EX4300

1

0

Article ID: KB35567 KB Last Updated: 26 Mar 2020Version: 1.0
Summary:

This article walks you through troubleshooting high CPU utilization on EX4300 devices.

 

Solution:

When you observe high CPU utilization, perform the following steps to understand the cause and isolate the source.

 

Depending on the load on the device, CPU utilization may vary. For example, if the device has several routes / MACs and routing protocols, high CPU can be expected. (Note: This article is with reference to EX4300 devices only.)

 
  1. Check the outputs of show chassis routing-engine from the Command Line Interface and top from shell.

root> show chassis routing-engine  

Routing Engine status:   
Slot 0:                         
    Current state                  Master     
    Temperature                 54 degrees C / 129 degrees F     
    CPU temperature             54 degrees C / 129 degrees F     
    DRAM                      2048 MB     
    Memory utilization          49 percent     
    CPU utilization:       
      User                       1 percent       
      Background                 0 percent       
      Kernel                     3 percent       
      Interrupt                  0 percent       
      Idle                      96 percent     
    Model                          EX4300-24T     
    Serial ID                      PG3716110055     
    Start time                     2018-06-28 03:06:24 UTC     
    Uptime                         9 days, 2 hours, 4 minutes, 35 seconds     
    Last reboot reason             Router rebooted after a normal shutdown.     
    Load averages:                 1 minute   5 minute  15 minute                                        
                                       2.34       2.89       2.94

The idle percentage should be any number ~50% based on the load of the device and also the load average should be ~2 or 1.

 

From the above output, you can determine which process is causing high CPU utilization. 

 
  • If PFEX and L2CPD values are high, it may mean that several MAC move / flood / STP related events are happening on the device.

  • Chassisd spiking may mean that the issue is related to interface delete / reconfigure / temperature of device or some chassis operations.

  • MGD means that some Junos Space / configuration / user login is hogging the CPU.

  • MCSNOOPD means some multicast traffic is hogging the switch.

  • MIB2D will indicate any issues with SNMP. 

  • RPD will indicate any issues with routing updates.

  • Authd will indicate if there is a problem with dot1x.

last pid: 12631;  load averages:  2.34,  2.89,  2.94    up 9+02:06:20  05:12:14 70 processes:  2 running, 67 sleeping, 1 zombie CPU states:  0.6% user,  0.0% nice,  1.0% system,  0.0% interrupt, 98.4% idle Mem: 703M Active, 66M Inact, 118M Wired, 312M Cache, 112M Buf, 664M Free Swap:  
  PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND 
  1137 root        2  44  -52   560M   200M select  31.2H 16.75% pfex_junos 
  1235 root        1 107    0 18968K 11656K select 764:21  9.91% ppmd 
  1246 root        1   4    0 25124K 15616K kqread 472:03  4.98% l2cpd 
  1262 root        1   4    0 26636K 12876K kqread 149:09  3.03% mcsnoopd 
  1131 root        1 101    0 46708K 19952K select 111:26  0.10% chassisd 
  1250 root        1  96    0 86252K 42408K select  49:15  0.00% authd 
  1224 root        1  96    0 42264K 20400K select  21:58  0.00% mib2d 
  1138 root        1   4    0 10628K  4088K kqread  17:12  0.00% chassism 
  1227 root        1  96    0 41976K 26240K select  16:59  0.00% l2ald 
  1263 root        1  96    0 15464K  8912K select  11:20  0.00% license-check 
  1249 root        1  96    0 68636K 21988K select   8:57  0.00% jdhcpd 
  1231 root        1  96    0 29544K 13000K select   8:27  0.00% pfed 
  1223 root        1  96    0 31524K 19164K select   8:24  0.00% snmpd 
  1259 root        1  96    0 15192K  3912K select   7:05  0.00% shm-rtsdbd 
  1234 root        1  96    0 23732K 12656K select   3:37  0.00% cosd 
  1219 root        1  96    0  2728K  1164K select   3:02  0.00% bslockd
 
  1. Below are some commands that will help in isolating the issue: 
  • show ethernet-switching mac-learning-log >>> See if MAC addresses are getting deleted and relearned very quickly. 
  • show system statistics bridge | match mac >>> Check the number of MAC address moves and learning.
  • show system core-dumps >>> See if there are any cores for any daemons.
  • show system alarms >>> Check for alarms.
  • show chassis alarms >>> Check for alarms. 
  1. Check for current logs and see if there are any abnormal logs or alarm messages by using show log messages| last 200.

  2. See if there are any interfaces that are sending / receiving a high amount of traffic. If you notice any interface spiking, consider disabling the interface and verify CPU utilization again. 

root> monitor interface traffic 
Interface    Link  Input packets        (pps)     Output packets        (pps)  
ge-0/0/0    Down   226528864551          (0)         16685266          (0)  
gr-0/0/0      Up              0          (0)                0          (0)  
pfh-0/0/0     Up              0                             0  
ge-0/0/1    Down              0          (0)                0          (0)  
ge-0/0/2    Down              0          (0)                0          (0)  
ge-0/0/3    Down              0          (0)                0          (0)  
ge-0/0/4    Down              0          (0)                0          (0)  
ge-0/0/5    Down              0          (0)                0          (0)  
ge-0/0/6    Down              0          (0)                0          (0)  
ge-0/0/7    Down              0          (0)                0          (0)  
ge-0/0/8    Down              0          (0)                0          (0)  
ge-0/0/9    Down              0          (0)                0          (0)  
ge-0/0/10   Down              0          (0)                0          (0)  
ge-0/0/11   Down              0          (0)                0          (0)  
ge-0/0/12   Down              0          (0)                0          (0)  
ge-0/0/13   Down              0          (0)                0          (0)  
ge-0/0/14   Down              0          (0)                0          (0)  
ge-0/0/15   Down              0          (0)                0          (0)  
ge-0/0/16   Down              0          (0)                0          (0)​

Note: If you notice some interfaces spiking and sending/receiving a high amount of traffic, do a port mirroring of the interface and see what kind of traffic is coming in. If you notice packets such as SSDP / MLD / IPV6 that are not expected, check why the interface is sending / receiving such traffic. 

  1. Sometimes, monitoring the interface on the switch itself also shows the kind of traffic. You could verify this as well.
monitor traffic interface ge-0/0/0 no-resolve     
  1. Or you could save this to a file and view this in Wireshark after downloading. 
tcpdump -i ge-0/0/0 -s 9000 -w /var/tmp/file.pcap ​
 

Now let’s say you identify some traffic / packets flooding in the above two methods. You could apply a filter on the interface to block this traffic. You could also apply this filter to the loopback interface.

 

Example with SNMP

[edit firewall family ethernet-switching] 
+     filter BLOCK-SNMP { 
+         term 1 { 
+             from { 
+                 destination-port snmp;  >>>We will be blocking only traffic with destination port SNMP. 
+                 ip-source-address { 
+                     10.210.96.13/32;    >>>Only traffic coming from this IP will be blocked (any other SNMP traffic coming from different IP will not be blocked.) 
+                 } 
+             } 
+             then { 
+                 discard;                >>>>Whenever the filter is encountered, the packet will be discarded.
+                 count SNMP-BLOCK;       >>>> We include a counter to make sure that the SNMP packets are hitting the filter. 
+             } 
+         } 
+         term 2 { 
+             then accept;                >>>> We include a second term with "accept" so that any other traffic is accepted and doesn’t get affected by the filter.
+         } 
+     }   

The outputs of the following commands would also be very helpful in understanding the cause of high CPU utilization. Contact Support if further assistance is required. 

  • show system uptime
  • show chassis routing-engine
  • show system process extensive
  • show task memory detail
  • show task io
  • show system buffers 
  • set task accounting on
  • show task accounting detail 
  • show task jobs
  • show task io
  • show krt queue
  • show krt state
  • show system process extensive 

Wait for 30 seconds and execute the above commands a few times, and then run the following command:

 set task accounting off 

 

Related Links

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search