Support Support Downloads Knowledge Base Juniper Support Portal Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[QFX] How to collect data to debug high memory utilization by smid process

0

0

Article ID: KB37428 KB Last Updated: 28 Sep 2021Version: 1.0
Summary:

This article details the steps to collect data for troubleshooting high memory utilization by the Subscriber Management Infrastructure Daemon (smid) process in QFX Series devices.

Symptoms:

When a standalone QFX node or VC Routing Engine reports high memory utilization because of the smid process, the steps mentioned in the solution are required for further troubleshooting the problem.

root@jtac-lab:1% top -SH

last pid: 78703;  load averages:  0.34,  0.27,  0.20^[[1;132H up 845+03:45:20^[[1;149H08:25:34
132 processes: 3 running, 110 sleeping, 1 zombie, 18 waiting
CPU states:     % user,     % nice,     % system,     % interrupt,     % idle
Mem: 1017M Active, 99M Inact, 492M Wired, 139M Cache, 69M Buf, 139M Free

Swap:

  PID USERNAME PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
   10 root     171   52     0K    12K RUN       ??? 77.59% idle
53868 root      98    0  1190M   420M RUN     21.8H 16.60% fxpc
53868 root      44  -52  1190M   420M select  21.8H  0.98% fxpc
53976 root      96    0 48116K 13160K select  42:00  0.05% chassisd
   23 root     -68 -187     0K    12K WAIT   441.5H  0.00% irq11: uhci0 em0++*
   13 root     -40 -159     0K    12K WAIT   129.2H  0.00% swi2: netisr 0
   11 root     -20 -139     0K    12K WAIT    70.2H  0.00% swi7: clock sio
39004 root       4    0     0K    12K pslave  19.6H  0.00% peerproxy11000080
40601 root       4    0     0K    12K pslave 879:36  0.00% peerproxy10000080
 1030 root      96    0     0K    12K select 670:51  0.00% jsr_kkcm
37518 root      96    0  8136K  3832K select 533:52  0.00% license-check
    4 root      -8    0     0K    12K -      532:35  0.00% g_down
  635 root     -40 -159     0K    12K WAIT   475:43  0.00% swi2: FXBCMITHRD
   29 root     -16    0     0K    12K client 415:51  0.00% ifstate notify
 1195 root      96    0  5896K  3720K select 360:28  0.00% eventd
37513 root      96    0 12388K  3640K select 312:17  0.00% shm-rtsdbd
   14 root     -16    0     0K    12K -      308:07  0.00% yarrow
37311 root       4  -20 12852K  5644K kqread 226:11  0.00% vccpd
   35 root     171   52     0K    12K pgzero 160:52  0.00% pagezero
37308 root      96    0  5440K  2980K select  99:14  0.00% irsd
   39 root      20    0     0K    12K vnlrum  78:09  0.00% vnlru_mem
   21 root     -64 -183     0K    12K WAIT    71:43  0.00% irq14: ata0
    3 root      -8    0     0K    12K -       69:37  0.00% g_up
   59 root     -16    0     0K    12K -       59:57  0.00% schedcpu
37514 root      96    0   667M   657M select  51:08  0.00% smid         <<<<<<<<<<<<smid is using 657M memory.
   44 root     -16    0     0K    12K psleep  45:36  0.00% vmkmemdaemon
   38 root      20    0     0K    12K syncer  36:39  0.00% syncer
    2 root      -8    0     0K    12K -       25:56  0.00% g_event
   24 root       8    0     0K    12K usbevt  24:51  0.00% usb0
37310 root      96    0  9220K  3760K select  15:10  0.00% sdk-vmmd
   18 root       8    0     0K    12K -       11:11  0.00% kqueue taskq
   54 root     -16    0     0K    12K .       10:03  0.00% ddostasks
37304 root      96    0 43644K 30148K select   9:02  0.00% mgd
37503 root      96    0  7556K  2436K select   8:58  0.00% craftd
  615 root      -8    0     0K    12K mdwait   8:12  0.00% md12
54915 root       4    0 69072K 28148K kqread   8:09  0.00% rpd
37516 root      96    0  4944K  2468K select   8:03  0.00% pmond
Solution:

Perform the following steps:

  1. Check the Routing Engine (RE) memory utilization from the show chassis routing-engine output:

root@qfx-leaf-137> show chassis routing-engine no-forwarding

Routing Engine status:
  Slot 0:
    Current state                  Master
    Temperature                 42 degrees C / 107 degrees F
    CPU temperature             42 degrees C / 107 degrees F
    DRAM                      1953 MB
    Memory utilization          53 percent
    CPU utilization:
      User                      22 percent
      Background                 0 percent
      Kernel                    12 percent
      Interrupt                  3 percent
      Idle                      63 percent
    Model                          QFX Routing Engine
    Serial ID                      BUILTIN
    Start time                     2021-08-12 04:31:08 UTC
    Uptime                         5 hours, 27 minutes, 24 seconds
    Last reboot reason             0x1:power cycle/failure
    Load averages:                 1 minute   5 minute  15 minute
                                       0.82       0.74       0.71

Routing Engine status:
  Slot 1:
    Current state                  Backup
    Temperature                 42 degrees C / 107 degrees F
    CPU temperature             42 degrees C / 107 degrees F
    DRAM                      1953 MB
    Memory utilization          80 percent
    CPU utilization:
      User                      18 percent
      Background                 0 percent
      Kernel                     5 percent
      Interrupt                  0 percent
      Idle                      78 percent
    Model                          QFX Routing Engine
    Serial ID                      BUILTIN
    Uptime                         4 hours, 37 minutes, 20 seconds
    Last reboot reason             0x2000:hypervisor reboot
    Load averages:                 1 minute   5 minute  15 minute
                                    0.21       0.22       0.20
  1. Find the process (here smid is using more memory) and the memory usage for it as follows:

root@jtac-lab:1% top -SH

last pid: 78703;  load averages:  0.34,  0.27,  0.20^[[1;132H up 845+03:45:20^[[1;149H08:25:34
132 processes: 3 running, 110 sleeping, 1 zombie, 18 waiting
CPU states:     % user,     % nice,     % system,     % interrupt,     % idle
Mem: 1017M Active, 99M Inact, 492M Wired, 139M Cache, 69M Buf, 139M Free

Swap:
  PID USERNAME PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
   10 root     171   52     0K    12K RUN       ??? 77.59% idle
53868 root      98    0  1190M   420M RUN     21.8H 16.60% fxpc
53868 root      44  -52  1190M   420M select  21.8H  0.98% fxpc
53976 root      96    0 48116K 13160K select  42:00  0.05% chassisd
   23 root     -68 -187     0K    12K WAIT   441.5H  0.00% irq11: uhci0 em0++*
   13 root     -40 -159     0K    12K WAIT   129.2H  0.00% swi2: netisr 0
   11 root     -20 -139     0K    12K WAIT    70.2H  0.00% swi7: clock sio
39004 root       4    0     0K    12K pslave  19.6H  0.00% peerproxy11000080
40601 root       4    0     0K    12K pslave 879:36  0.00% peerproxy10000080
 1030 root      96    0     0K    12K select 670:51  0.00% jsr_kkcm
37518 root      96    0  8136K  3832K select 533:52  0.00% license-check
    4 root      -8    0     0K    12K -      532:35  0.00% g_down
  635 root     -40 -159     0K    12K WAIT   475:43  0.00% swi2: FXBCMITHRD
   29 root     -16    0     0K    12K client 415:51  0.00% ifstate notify
 1195 root      96    0  5896K  3720K select 360:28  0.00% eventd
37513 root      96    0 12388K  3640K select 312:17  0.00% shm-rtsdbd
   14 root     -16    0     0K    12K -      308:07  0.00% yarrow
37311 root       4  -20 12852K  5644K kqread 226:11  0.00% vccpd
   35 root     171   52     0K    12K pgzero 160:52  0.00% pagezero
37308 root      96    0  5440K  2980K select  99:14  0.00% irsd
   39 root      20    0     0K    12K vnlrum  78:09  0.00% vnlru_mem
   21 root     -64 -183     0K    12K WAIT    71:43  0.00% irq14: ata0
    3 root      -8    0     0K    12K -       69:37  0.00% g_up
   59 root     -16    0     0K    12K -       59:57  0.00% schedcpu
37514 root      96    0   667M   657M select  51:08  0.00% smid      <<<<<<<<<<<<<<<<<<<<<<<smid is using 657M memory.
   44 root     -16    0     0K    12K psleep  45:36  0.00% vmkmemdaemon
   38 root      20    0     0K    12K syncer  36:39  0.00% syncer
    2 root      -8    0     0K    12K -       25:56  0.00% g_event
   24 root       8    0     0K    12K usbevt  24:51  0.00% usb0
37310 root      96    0  9220K  3760K select  15:10  0.00% sdk-vmmd
   18 root       8    0     0K    12K -       11:11  0.00% kqueue taskq
   54 root     -16    0     0K    12K .       10:03  0.00% ddostasks
37304 root      96    0 43644K 30148K select   9:02  0.00% mgd
37503 root      96    0  7556K  2436K select   8:58  0.00% craftd
  615 root      -8    0     0K    12K mdwait   8:12  0.00% md12
54915 root       4    0 69072K 28148K kqread   8:09  0.00% rpd
37516 root      96    0  4944K  2468K select   8:03  0.00% pmond
  1. Collect a live core of the smid process twice with a minute's interval:

root@jtac-lab% gcore -c /var/tmp/smid.core.0 <PID>
<after a minute>
root@jtac-lab% gcore -c /var/tmp/smid.core.1 <PID> 
  1. Enable smid traceoptions.

root@jtac-lab#set system services subscriber-management traceoptions flag all 
root@jtac-lab#set system services subscriber-management traceoptions file size 1m
root@jtac-lab#commit
  1. Restart the smid process.

Caution: This step should be executed with caution on a production network and it is recommended that you perform this during a maintenance window.

On a standalone switch:

root@jtac-lab> show system processes | match smid
1698 ?? S 0:20.31 /usr/sbin/smid -N

{master:1}
root@jtac-lab> restart subscriber-management
Subscriber management process started, pid 20124

{master:1}
root@jtac-lab> show system processes | match smid
20124 ?? S 0:00.12 /usr/sbin/smid -N

On a Virtual-chassis (assuming that member 1 is the primary/routing-engine)

root@jtac-lab> show system processes member 1 | match smid
20133 ?? S 0:00.10 /usr/sbin/smid -N

{master:1}
root@jtac-lab> restart subscriber-management member 1

fpc1:
--------------------------------------------------------------------------
Subscriber management process started, pid 20145

{master:1}
root@jtac-lab> show system processes member 1 | match smid
20145 ?? S 0:00.09 /usr/sbin/smid -N
  1. Check memory utilization by using the show chassis routing-engine command.

root@jtac-lab> show chassis routing-engine no-forwarding

Routing Engine status:
  Slot 0:
    Current state                  Master
    Temperature                 42 degrees C / 107 degrees F
    CPU temperature             42 degrees C / 107 degrees F
    DRAM                      1953 MB
    Memory utilization          53 percent
    CPU utilization:
      User                      22 percent
      Background                 0 percent
      Kernel                    12 percent
      Interrupt                  3 percent
      Idle                      63 percent
    Model                          QFX Routing Engine
    Serial ID                      BUILTIN
    Start time                     2021-08-12 04:31:08 UTC
    Uptime                         5 hours, 27 minutes, 24 seconds
    Last reboot reason             0x1:power cycle/failure
    Load averages:                 1 minute   5 minute  15 minute
                                       0.82       0.74       0.71

Routing Engine status:

  Slot 1:
    Current state                  Backup
    Temperature                 42 degrees C / 107 degrees F
    CPU temperature             42 degrees C / 107 degrees F
    DRAM                      1953 MB
    Memory utilization          80 percent
    CPU utilization:
      User                      18 percent
      Background                 0 percent
      Kernel                     5 percent
      Interrupt                  0 percent
      Idle                      78 percent
    Model                          QFX Routing Engine
    Serial ID                      BUILTIN
    Uptime                         4 hours, 37 minutes, 20 seconds
    Last reboot reason             0x2000:hypervisor reboot
    Load averages:                 1 minute   5 minute  15 minute
                                    0.21       0.22       0.20
  1. If memory utilization reaches 80+, collect the live core of the smid process again twice with an interval of 1-2 minutes. This is safe to collect from a production network.

root@jtac-lab:1% gcore -c /var/tmp/smid.core.2 <PID>
<after 1-2 minutes>
root@jtac-lab:1% gcore -c /var/tmp/smid.core.3 <PID>
  1. Open a Support case with the RSI, /var/log from all members, and all collected live cores of the smid process.

Note: After collecting the logs, traceoptions enabled in Step 4 can be disabled.

Related Links

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search