As mentioned in KB23726 - [Junos] How to monitor Routing Engine's CPU Load Averages using SNMP, a Routing Engine's CPU load average can be monitored using SNMP in 1 minute time spans.
This article explains the value of 'jnxOperating1MinLoadAvg'
. It is the CPU's load average over the last 1 minute. An example is provided, showing it as a percentage value of zero if unavailable or inapplicable. Sometimes, the value is over 100. This maybe be expected, depending on the production network environment. This value has a different meaning than the actual CPU Utilization percentage.
> show system processes extensive |except 0.00%|no-more
last pid: 91449; load averages: 0.93, 0.62, 0.45 up 482+12:22:23 17:28:46
157 processes: 7 running, 134 sleeping, 16 waiting
Mem: 552M Active, 244M Inact, 239M Wired, 1207M Cache, 69M Buf, 1261M Free
Swap: 2048M Total, 2048M Free
PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
11 root 1 171 52 0K 12K RUN 9774.7 81.20% idle
2885 root 2 8 -88 122M 22232K nanslp 828.9H 6.88% chassisd
2503 root 1 96 0 382M 328M RUN 214.6H 1.12% rpd
20 root 1 -68 -187 0K 12K RUN 133.8H 0.93% irq10: em0 em1+++*
14 root 1 -40 -159 0K 12K WAIT 78.5H 0.05% swi2: netisr 0
{MASTER}
> show snmp mib walk jnxOperatingEntry | match LoadAvg.9.1.0.0
jnxOperating1MinLoadAvg.9.1.0.0 = 117
jnxOperating5MinLoadAvg.9.1.0.0 = 67
jnxOperating15MinLoadAvg.9.1.0.0 = 46
Routing Engine 0 REV 10 740-013063 9009040972 RE-S-2000
ad0 999 MB SILICONSYSTEMS INC 1GB C9433114328209060R07
CPU Load Average is the same as what we see in Unix-like systems while executing utilities such as top, uptime etc. The number indicates the number of jobs in a run queue or waiting for the CPU resource, averaged out for 1 minute.
When checking this value, you also need to consider how many Cores the RE has. For example, the scenario described in this article, RE-S-2000 only has one core. So 1 minute average of 117 means the CPU was overloaded by 17%. That means 17% processes or jobs were waiting for CPU resource averaged out for the past 1 minute.
In the case of multiple core REs, 117 is still acceptable. The maximum percent capacity for a dual core processor would be 200%. Similarly, if you have a quad-core, 400% is the max value representing 100% usage.
The CPU load is different from the CPU ultilization result from the 'show system processes extensive'
result
.
For a better understanding on this topic, see Wikipedia's explanation on Load Computing.
When the value of 'jnxOperating1MinLoadAvg'
is more than 100, it does not mean the current CPU ultilization is over 100%. This is expected depending on the production environment.
To check the CPU utilization as a percentage, use 'jnxOperatingCPU'
:
> show snmp mib walk jnxOperatingCPU |match 9.1.0.0
jnxOperatingCPU.9.1.0.0 = 3