This article explains memory usage on a Juniper MX device and how to troubleshoot memory-related issues.
Memory-related issues on a Juniper device can trigger various performance related problems in the network. In order to troubleshoot the underlying problem, it is imperative to understand the various components of the physical memory.
First, start with 'show chassis routing-engine'
to verify the current usage. Remember to take multiple iterations of this command with a gap of one minute in order to see a sustained usage of high memory:
show chassis routing-engine
Routing Engine status:
Slot 0:
Current state Master
Election priority Master (default)
Temperature 32 degrees C / 89 degrees F
CPU temperature 31 degrees C / 87 degrees F
DRAM 16320 MB (16384 MB installed) <-- This is the total physical memory available: 16GB
Memory utilization 8 percent <-- Current usage
5 sec CPU utilization:
User 2 percent
Background 0 percent
Kernel 1 percent
Interrupt 0 percent
Idle 97 percent
1 min CPU utilization:
User 0 percent
Background 0 percent
Kernel 1 percent
Interrupt 0 percent
Idle 99 percent
5 min CPU utilization:
User 0 percent
Background 0 percent
Kernel 1 percent
Interrupt 0 percent
Idle 99 percent
15 min CPU utilization:
User 0 percent
Background 0 percent
Kernel 1 percent
Interrupt 0 percent
Idle 99 percent
Model RE-S-1800x4
Serial ID 9009094177
Start time 2020-12-02 15:53:37 PST
Uptime 13 days, 3 hours, 9 minutes, 41 seconds
Last reboot reason Router rebooted after a normal shutdown.
Load averages: 1 minute 5 minute 15 minute
0.40 0.26 0.24
Routing Engine status:
Slot 1:
Current state Backup
Election priority Backup (default)
Temperature 32 degrees C / 89 degrees F
DRAM 16320 MB
Memory utilization 6 percent
5 sec CPU utilization:
User 0 percent
Background 0 percent
Kernel 0 percent
Interrupt 0 percent
Idle 100 percent
Model RE-S-1800x4
Serial ID 9009102182
Start time 2020-12-02 16:02:36 PST
Uptime 13 days, 3 hours, 40 seconds
Last reboot reason Router rebooted after a normal shutdown.
If there is sustained high utilization in the above output, the next step is to figure out the process consuming the highest memory:
show system processes extensive
last pid: 80591; load averages: 0.22, 0.23, 0.23 up 13+03:11:42 19:05:19
370 processes: 5 running, 337 sleeping, 28 waiting
Mem: 99M Active, 3709M Inact, 909M Wired, 479M Buf, 11G Free
Swap: 8192M Total, 8192M Free
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11 root 155 ki31 0K 64K CPU2 2 311.2H 100.00% idle{idle: cpu2}
11 root 155 ki31 0K 64K CPU1 1 311.2H 100.00% idle{idle: cpu1}
11 root 155 ki31 0K 64K CPU3 3 311.2H 100.00% idle{idle: cpu3}
11 root 155 ki31 0K 64K RUN 0 310.7H 99.37% idle{idle: cpu0}
10410 root 4 0 1495M 701M select 1 722:58 2.59% chassisd
10431 root 20 0 727M 13272K select 0 32:53 0.00% clksyncd
10551 root 20 0 900M 44620K nanslp 2 16:39 0.00% rep-serverd
10550 root 20 0 900M 44616K nanslp 0 16:24 0.00% rep-clientd
10416 root 20 0 800M 63536K select 1 15:48 0.00% mib2d
The above command shows both the memory usage and CPU cycles consumed by a process. It is important to look at the right place.
Understanding the output:
last pid: 80591; load averages: 0.22, 0.23, 0.23 up 13+03:11:42 19:05:19
370 processes: 5 running, 337 sleeping, 28 waiting
Mem: 99M Active, 3709M Inact, 909M Wired, 479M Buf, 11G Free
Swap: 8192M Total, 8192M Free
The top few lines of the output highlights the number of total processes, both active and inactive (sleeping).
It breaks down the total available memory into active, inactive, wired, buffer, and free. Adding all of these components equals to 16 GB of total memory available.
- Active: Memory that is allocated and actively used by programs,
- Inactive: Either memory that is allocated but not recently used or memory that was freed by programs. Inactive memory is still mapped in the address space of one or more processes and, therefore, counts toward the resident set size of those processes.
- Wired: Memory that is not eligible to be swapped, usually used for kernel memory structures and/or memory physically locked by a process.
- Buffer: Size of the memory buffer used to hold data recently called from disk.
- Free: Completely free memory not associated with any programs.
Swap memory is a virtual memory that comes into use in situations of memory pressure. These memory pages are stored on the disk and are used by the process when there is no active memory left to be allocated. Understandably, the access to this memory drives latency in the process and is never a good sign for the system in terms of memory health.
Kernel is the core component of the operating system that is responsible for assigning the resources to the various processes. It utilizes the pageout daemon to scan the current memory usage and free up memory based on needs of the processes. When a process requests memory page requests, the pageout daemon follows the below sequence:
Free memory >> Cache memory >> Inactive memory >> Active memory
Cache memory is a freed up memory that is not being used by any process is ready to be re-used.
As evident from above, the pageout daemon tries to access the active memory to an application at the end when no other choice exists.
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
10410 root 4 0 1495M 701M select 1 722:58 2.59% chassisd
In the above output, the process ID for chassisd as 10410. Size is the total virtual memory size allocated to the process. RES is the resident size of the process in physical memory. It is the sum on active and inactive memory currently allocated to the process.
If there are any processes utilizing higher memory or multiple daemons of the same process being spawned, troubleshoot further to isolate the reason. In many cases, there are daemons like mosquitos (used for telemetry ) being spawned multiple times, with different process ID. In such cases, a single parent process creates multiple child processes. Try the steps below to clear this issue:
-
Validate or reach out to JTAC if it is safe to restart the process. If yes, restart the process and check if the child processes are still spinning.
-
In many cases, the processes are spawned only on one RE, sometimes due to a sync issue with the other RE. Enable GRES and NSR and switchover the primary role to the RE which does not spin the child processes and reboot the other RE.
-
In case it is the RPD that is consuming higher memory, check the output of 'show task memory'
.
show task memory
Memory Size (kB) Percentage When
Currently In Use: 47145 0% now
Maximum Ever Used: 58716 0% 20/12/02 15:56:35
Available: 17092010 100% now
This gives the memory utilization for RPD and the highest memory ever used by the RPD and the concerned timestamp as well.
-
Check the output of 'show task memory detail'
for a detailed version of the output.