Knowledge Center Search


 

[ScreenOS] Troubleshooting high CPU on a firewall device

  [KB9453] Show KB Properties

  [KB9453] Hide KB Properties

Categories:
Knowledge Base ID: KB9453
Last Updated: 10 Jun 2015
Version: 22.0

Summary:

CPU utilization is extremely high on the Juniper firewall. What is triggering the high CPU situation?

Problem or Goal:

Packets passed to, through, or processed by the firewall could use the CPU.  The firewall will start to experience problems if the CPU begins to reach 85%. The symptoms include:

  • High CPU utilization
  • Poor system or throughput performance
  • OSPF adjacencies or BGP peering is failing
  • Device management is slower than normal
  • Ping to the management interface times out
  • Firewall is not passing traffic
  • Packet drops
  • The 'in overrun' counter (get counter stat) could increment

Solution:

To Troubleshoot a High CPU Situation

step1/  Check the CPU utilization.

The CPU utilization is calculated based on two entities: Flow and Task.   CPU utilization is defined as the percentage of time the CPU spends on processing, instead of sitting idle.  When CPU utilization is high, it means it is busy processing network traffic, but it does not mean that it cannot keep up and will start dropping packets.  CPU utilization is only a measure of network load through the firewall, not the throughput of the box itself.

Note: On all firewall appliance devices (NetScreen-5, 25, 50, 204, 208, and SSG Series), one CPU is used for processing.  On ASIC-based hardware firewalls (NS-5000, ISG devices) there are two CPUs: one dedicated for Flow and the other dedicated for Task.

The CLI command get perf cpu detail will show an overview of the CPU percentage, with the last 1 minute broken down into average CPU during single-second segments:

Sample:
ns5200-> get perf cpu detail
Average System Utilization:  2%
Last 60 seconds:
59:  2    58:  2    57:  2    56:  2    55:  2    54:  2    
53:  2    52:  2    51:  2    50:  2    49:  2    48:  2    
47:  2    46:  2    45:  2    44:  2    43:  2    42:  2    
41:  2    40:  2    39:  2    38:  2    37:  2    36:  2    
35:  2    34:  2    33:  2    32:  2    31:  2    30:  2    
29:  2    28:  2    27:  2    26:  2    25:  2    24:  2    
23:  2    22:  2    21:  2    20:  2    19:  2    18:  2    
17:  2    16:  2    15:  2    14:  2    13:  2    12:  2    
11:  2    10:  2     9:  2     8:  2     7:  2     6:  2    
 5:  2     4:  2     3:  2     2:  2     1:  2     0:  2    

Last 60 minutes:
59:  2    58:  2    57:  2    56:  2    55:  2    54:  2    
53:  2    52:  2    51:  2    50:  2    49:  2    48:  2    
47:  2    46:  2    45:  2    44:  2    43:  2    42:  2    
41:  2    40:  2    39:  2    38:  2    37:  2    36:  2    
35:  2    34:  2    33:  2    32:  2    31:  2    30:  2    
29:  2    28:  2    27:  2    26:  2    25:  2    24:  2    
23:  2    22:  2    21:  2    20:  2    19:  2    18:  2    
17:  2    16:  2    15:  2    14:  2    13:  2    12:  2    
11:  2    10:  2     9:  2     8:  2     7:  2     6:  2    
 5:  2     4:  2     3:  2     2:  2     1:  2     0:  2    

Last 24 hours:
23:  2    22:  2    21:  2    20:  2    19:  2    18:  2    
17:  2    16:  2    15:  2    14:  2    13:  2    12:  2    
11:  2    10:  2     9:  2     8:  2     7:  2     6:  2    
 5:  2     4:  2     3:  2     2:  2     1:  2     0:  2   


  • Average system utilization is the average CPU utilization for the last 24 hrs. For example, if the system up time is 48 hrs and 18 minutes, then the average system utilization is the average CPU utilization in the last 24 hours, excluding that 18 minutes.
    • If system up time is less than 24 hrs but greater than 1 hr, it will be average utilization up to the last hour. For example, if the system is up 10 hrs 40 minutes, the average system utilization is the CPU utilization in 10 hrs (excluding 40 minutes).
    • If system up time is less than 1 hr, (for example, 34 minutes 26 seconds), then average utilization is the CPU utilization in the last 34 minutes (excluding 26 seconds).
    • If system up time is less than 1 minute, for example, 48 seconds, then average utilization is computed over that 48 seconds.


step2/  Determine if the high CPU is caused by Flow or Task.

The command get perf cpu all detail lists the utilization history of the CPU by Flow and Task. The first number within the parentheses refers to the Flow CPU, and the second number represents the Task CPU. 
---------------------------------------------------------------------
nsisg2000-> get perf cpu all detail
Average System Utilization: 55% (61  5)
Last 60 seconds:
59: 86(96  2)*** 58: 85(95  0)**  57: 86(96  2)*** 56: 85(95  0)**
55: 85(95  2)**  54: 86(96  0)*** 53: 86(96  2)*** 52: 86(96  0)***
51: 86(96  2)*** 50: 85(95  1)**  49: 86(96  2)*** 48: 86(96  0)***
47: 86(96  3)*** 46: 86(96  0)*** 45: 86(96  2)*** 44: 86(96  0)***
43: 86(96  2)*** 42: 86(96  0)*** 41: 86(96  2)*** 40: 86(96  0)***
39: 86(96  2)*** 38: 86(96  0)*** 37: 86(96  2)*** 36: 86(96  0)***
35: 86(96  2)*** 34: 86(96  1)*** 33: 85(95  4)**  32: 85(95  0)**
31: 86(96  2)*** 30: 86(96  0)*** 29: 86(96  2)*** 28: 86(96  1)***
27: 86(96  3)*** 26: 86(96  0)*** 25: 86(96  2)*** 24: 86(96  0)***
23: 86(96  2)*** 22: 86(96  0)*** 21: 86(96  2)*** 20: 86(96  0)***
19: 86(96  2)*** 18: 86(96  1)*** 17: 86(96  2)*** 16: 86(96 36)***
15: 86(96  2)*** 14: 86(96  0)*** 13: 85(95  2)**  12: 86(96  0)***
11: 86(96  3)*** 10: 86(96  0)***  9: 86(96  2)***  8: 86(96  0)***
 7: 86(96  3)***  6: 86(96  0)***  5: 85(95  3)**   4: 86(96  1)***
 3: 86(96  2)***  2: 86(96  1)***  1: 86(96  2)***  0: 85(95  0)**

Last 60 minutes:
59: 85(95  1)**  58: 85(95 24)**  57: 84(94  1)**  56: 84(94  1)**
55: 84(94  1)**  54: 84(94  1)**  53: 83(93  1)**  52: 83(93  1)**
51: 82(92  1)**  50: 82(92  1)**  49: 83(93  2)**  48: 82(92  1)**
47: 82(92  1)**  46: 81(91  1)**  45: 81(91  1)**  44: 80(90  2)**
43: 81(91 14)**  42: 79(89 72)**  41: 57(22 66)*   40: 53(19 63)*
39: 53( 1 63)*   38: 53(18 63)*   37: 61(57 65)*   36: 56(34 64)*
35: 59(58 66)*   34: 32(35 11)    33: 26(33  1)    32: 70(80  0)*
31: 66(76  0)*   30: 50(60  0)*   29: 48(58  1)    28: 26(36  4)
27: 24(32  2)    26: 45(54  1)    25: 55(65  1)*   24: 21(30  1)
23: 63(73  0)*   22: 33(40  0)    21: 11(13  1)    20: 53(63  0)*
19: 78(88  1)**  18:  9(13  2)    17: 46(56  2)    16: 19(29  1)
15: 38(48  0)    14: 35(45  1)    13: 63(73  0)*   12: 79(89  0)**
11: 78(88  1)**  10: 36(45  0)     9: 22(27  1)     8: 31(41  0)
 7: 71(81  1)**   6:  5( 6  2)     5:  4( 5  0)     4: 31(39  1)
 3:  5( 5  1)     2: 56(66  0)*    1: 42(52  0)     0: 25(34  2)

Last 24 hours:
23: 44(48 10)    22: 66(74  1)*   21: N/A          20: N/A     
19: N/A          18: N/A          17: N/A          16: N/A     
15: N/A          14: N/A          13: N/A          12: N/A     
11: N/A          10: N/A           9: N/A           8: N/A     
 7: N/A           6: N/A           5: N/A           4: N/A     
 3: N/A           2: N/A           1: N/A           0: N/A

----------------------------------------------------------------------
  • A single asterisk  *  indicates that the CPU is nearing a warning threshold. It is marked when utilization is  ≥ 50%  and  ≤ 70%.
  • Double asterisks  **  indicate to the administrator that CPU is nearing a high level; the administrator should investigate the cause of why CPU is nearing this level.  It is marked when utilization ≥ 70% and ≤ 85%.
  • Triple asterisks  ***  indicate that the CPU utilization is high; the administrator should investigate the cause of why CPU is high. It is marked when utilization is ≥ 85%.

step3  Investigate what could be causing the high CPU.

step4  Collect data and open a case. See KB6987 - What data should I collect to troubleshoot High CPU issues on a Firewall device?

 

Purpose:
Troubleshooting

Related Links:

 

 

ASK THE KB

Question or KB ID:


 


 

 
Copyright© 1999-2012 Juniper Networks, Inc. All rights reserved.