This article walks you through troubleshooting high CPU utilization on EX4300 devices.
When you observe high CPU utilization, perform the following steps to understand the cause and isolate the source.
Depending on the load on the device, CPU utilization may vary. For example, if the device has several routes / MACs and routing protocols, high CPU can be expected. (Note: This article is with reference to EX4300 devices only.)
-
Check the outputs of show chassis routing-engine
from the Command Line Interface and top
from shell.
root> show chassis routing-engine
Routing Engine status:
Slot 0:
Current state Master
Temperature 54 degrees C / 129 degrees F
CPU temperature 54 degrees C / 129 degrees F
DRAM 2048 MB
Memory utilization 49 percent
CPU utilization:
User 1 percent
Background 0 percent
Kernel 3 percent
Interrupt 0 percent
Idle 96 percent
Model EX4300-24T
Serial ID PG3716110055
Start time 2018-06-28 03:06:24 UTC
Uptime 9 days, 2 hours, 4 minutes, 35 seconds
Last reboot reason Router rebooted after a normal shutdown.
Load averages: 1 minute 5 minute 15 minute
2.34 2.89 2.94
The idle percentage should be any number ~50% based on the load of the device and also the load average should be ~2 or 1.
From the above output, you can determine which process is causing high CPU utilization.
-
If PFEX and L2CPD values are high, it may mean that several MAC move / flood / STP related events are happening on the device.
-
Chassisd spiking may mean that the issue is related to interface delete / reconfigure / temperature of device or some chassis operations.
-
MGD means that some Junos Space / configuration / user login is hogging the CPU.
-
MCSNOOPD means some multicast traffic is hogging the switch.
-
MIB2D will indicate any issues with SNMP.
-
RPD will indicate any issues with routing updates.
-
Authd will indicate if there is a problem with dot1x.
last pid: 12631; load averages: 2.34, 2.89, 2.94 up 9+02:06:20 05:12:14 70 processes: 2 running, 67 sleeping, 1 zombie CPU states: 0.6% user, 0.0% nice, 1.0% system, 0.0% interrupt, 98.4% idle Mem: 703M Active, 66M Inact, 118M Wired, 312M Cache, 112M Buf, 664M Free Swap:
PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
1137 root 2 44 -52 560M 200M select 31.2H 16.75% pfex_junos
1235 root 1 107 0 18968K 11656K select 764:21 9.91% ppmd
1246 root 1 4 0 25124K 15616K kqread 472:03 4.98% l2cpd
1262 root 1 4 0 26636K 12876K kqread 149:09 3.03% mcsnoopd
1131 root 1 101 0 46708K 19952K select 111:26 0.10% chassisd
1250 root 1 96 0 86252K 42408K select 49:15 0.00% authd
1224 root 1 96 0 42264K 20400K select 21:58 0.00% mib2d
1138 root 1 4 0 10628K 4088K kqread 17:12 0.00% chassism
1227 root 1 96 0 41976K 26240K select 16:59 0.00% l2ald
1263 root 1 96 0 15464K 8912K select 11:20 0.00% license-check
1249 root 1 96 0 68636K 21988K select 8:57 0.00% jdhcpd
1231 root 1 96 0 29544K 13000K select 8:27 0.00% pfed
1223 root 1 96 0 31524K 19164K select 8:24 0.00% snmpd
1259 root 1 96 0 15192K 3912K select 7:05 0.00% shm-rtsdbd
1234 root 1 96 0 23732K 12656K select 3:37 0.00% cosd
1219 root 1 96 0 2728K 1164K select 3:02 0.00% bslockd
- Below are some commands that will help in isolating the issue:
-
show system statistics bridge | match mac >>> Check the number of MAC address moves and learning.
-
show system core-dumps >>> See if there are any cores for any daemons.
-
show system alarms >>> Check for alarms.
-
show chassis alarms >>> Check for alarms.
-
Check for current logs and see if there are any abnormal logs or alarm messages by using show log messages| last 200
.
-
See if there are any interfaces that are sending / receiving a high amount of traffic. If you notice any interface spiking, consider disabling the interface and verify CPU utilization again.
root> monitor interface traffic
Interface Link Input packets (pps) Output packets (pps)
ge-0/0/0 Down 226528864551 (0) 16685266 (0)
gr-0/0/0 Up 0 (0) 0 (0)
pfh-0/0/0 Up 0 0
ge-0/0/1 Down 0 (0) 0 (0)
ge-0/0/2 Down 0 (0) 0 (0)
ge-0/0/3 Down 0 (0) 0 (0)
ge-0/0/4 Down 0 (0) 0 (0)
ge-0/0/5 Down 0 (0) 0 (0)
ge-0/0/6 Down 0 (0) 0 (0)
ge-0/0/7 Down 0 (0) 0 (0)
ge-0/0/8 Down 0 (0) 0 (0)
ge-0/0/9 Down 0 (0) 0 (0)
ge-0/0/10 Down 0 (0) 0 (0)
ge-0/0/11 Down 0 (0) 0 (0)
ge-0/0/12 Down 0 (0) 0 (0)
ge-0/0/13 Down 0 (0) 0 (0)
ge-0/0/14 Down 0 (0) 0 (0)
ge-0/0/15 Down 0 (0) 0 (0)
ge-0/0/16 Down 0 (0) 0 (0)
Note: If you notice some interfaces spiking and sending/receiving a high amount of traffic, do a port mirroring of the interface and see what kind of traffic is coming in. If you notice packets such as SSDP / MLD / IPV6 that are not expected, check why the interface is sending / receiving such traffic.
- Sometimes, monitoring the interface on the switch itself also shows the kind of traffic. You could verify this as well.
monitor traffic interface ge-0/0/0 no-resolve
- Or you could save this to a file and view this in Wireshark after downloading.
tcpdump -i ge-0/0/0 -s 9000 -w /var/tmp/file.pcap
Now let’s say you identify some traffic / packets flooding in the above two methods. You could apply a filter on the interface to block this traffic. You could also apply this filter to the loopback interface.
Example with SNMP
[edit firewall family ethernet-switching]
+ filter BLOCK-SNMP {
+ term 1 {
+ from {
+ destination-port snmp; >>>We will be blocking only traffic with destination port SNMP.
+ ip-source-address {
+ 10.210.96.13/32; >>>Only traffic coming from this IP will be blocked (any other SNMP traffic coming from different IP will not be blocked.)
+ }
+ }
+ then {
+ discard; >>>>Whenever the filter is encountered, the packet will be discarded.
+ count SNMP-BLOCK; >>>> We include a counter to make sure that the SNMP packets are hitting the filter.
+ }
+ }
+ term 2 {
+ then accept; >>>> We include a second term with "accept" so that any other traffic is accepted and doesn’t get affected by the filter.
+ }
+ }
The outputs of the following commands would also be very helpful in understanding the cause of high CPU utilization. Contact Support if further assistance is required.
-
show system uptime
-
show chassis routing-engine
-
show system process extensive
-
show task memory detail
-
show task io
-
show system buffers
-
set task accounting on
-
show task accounting detail
-
show task jobs
-
show task io
-
show krt queue
-
show krt state
-
show system process extensive
Wait for 30 seconds and execute the above commands a few times, and then run the following command:
set task accounting off