Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[MX] Sampled process Crash/Discrepency

0

0

Article ID: KB35853 KB Last Updated: 02 Jun 2020Version: 1.0
Summary:

Traffic sampling enables you to copy traffic to a Physical Interface Card (PIC) that performs flow accounting while the router forwards the packet to its original destination. You can configure the router to perform sampling in one of the following three locations:

  • On the Routing Engine, using the sampled process.

  • On the Monitoring Services, Adaptive Services, or Multiservices PIC.

  • On an inline data path without the need for a services Dense Port Concentrator (DPC). 

Cause:
  1. High utilization causing CPU for the sampled process to rise.
  2. No configuration under the forwarding-options hierarchy.
  3. Explicit disable of SRRD ( Sampling Route-Record Daemon) is configured.
  4. Sampled process fails to auto-spin after router reload.
  5. Memory issues (rlimit)
Solution:

Verify if there are alarms:

user@host> show chassis alarms


Verify the CPU utilization of the sampled process:

user@host>show system processes extensive |match sampled

If the above command does not display any output, it means the sampled process failed to spin back during the last reboot/process initialization.

If you see sampled running at a high CPU percentage, try to re-initiate the process. Make a note of the PID when you run "show system processes extensive".

Enter shell mode: root> start shell
root%kill -9 <PID>    <--
Enter sampled PID


To verify if the session is re-initiating, open another session to the router, and enter the shell mode. Now enter the below:

root%top    <-- Lists processes that are being run on the device; sampled should be seen in this list


Once this is done, verify CPU utilization again in "show system processes extensive"


Verify if there is any core dump generated for the sampled process:

user@host> show system core-dumps


Verify RE CPU utilization:

user@host> show chassis routing-engine   <-- Useful in RE based sampling


Verify if the sampling configuration exists under the forwarding-options hierarchy:

user@host# edit forwarding-options
[edit forwarding-options]
user@host#show
forwarding-options {
sampling {
input {
rate 4095;
run-length 0;
max-packets-per-second 5000;
}
family inet {
output {
flow-inactive-timeout 15;
flow-active-timeout 60;
flow-server 10.xx.xx.xx {
port 9991;
version 5;
}
flow-server 10.xx.xx.xx {
port 4567;
version 5;
}
}
}
}

The sampled daemon terminates when there is no configuration under the forwarding-options hierarchy. If there is any field that is active by default (without any configuration required under forwarding-options hierarchy), then ideally sampled daemon comes up by default, provided there is no factor stopping the process from spinning up. In case there is no such configuration present, you may consider adding the sampling configuration under the forwarding-options hierarchy to see if you are able to see sampled in your processes list.

In certain cases, sampled stops exporting flow records intermittently (mostly in RE based sampling). In that case, try the following in issue state:

Enable trace options:

user@host# set forwarding-options sampling traceoptions ?
Possible completions:
+ apply-groups         Groups from which to inherit configuration data
+ apply-groups-except  Don't inherit configuration data from these groups
 > file                 Trace file information
  no-remote-trace      Disable remote tracing

Note: Trace options might affect RE CPU utilization hike.


Dump flow records in a local file before exporting them to the collector:

user@host# set forwarding-options sampling family inet output flow-server <server1_ipaddress> local-dump       

Example:

set forwarding-options sampling traceoptions file re_sample_log
set forwarding-options sampling traceoptions file size 10m
set forwarding-options sampling traceoptions file files 10
set forwarding-options sampling traceoptions flag all
set forwarding-options sampling family inet output flow-server 55.55.55.55 local-dump       
set forwarding-options sampling family inet output flow-server 66.66.66.66 local-dump  <--
when two output flow servers are configured


Check interface statistics where sampling is enabled and verify if the packet count increases.

From the FPC prompt, collect the outputs below at the time of issue:

NPC0(ar1.qpg2 vty)# show pfe statistics sample  <--run this command a couple of times with a minute gap in between
 PFE Sampling Status:
 PFE-sample (Class 0):
   IPv4 (Protocol 0):
   4294967295 frequency
            1 run length
            0 nexthop id
          0x0 destmask
          128 clip size
            0 packet capture
            0 packets
            0 bytes
 
 PFE-sample (Class 0):
   IPv6 (Protocol 1):
   4294967295 frequency
            1 run length
            0 nexthop id
          0x0 destmask
          128 clip size
            0 packet capture
            0 packets
            0 bytes

 

NPC0(ar1.qpg2 vty)# show sample summary   <-- Run this command a couple of times with a minute of gap between each execution
 Sampling statistics summary:
   Max allowed samples per second:      3000
   Cumulative statistics:
     Total samples accepted:      5980758682
     Total samples dropped:                0
   Statistics for the last second:
     Samples accepted:                     0
     Samples dropped:                      0
 
 NPC0(ar1.qpg2 vty)# show sample summary
 Sampling statistics summary:
   Max allowed samples per second:      3000
   Cumulative statistics:

     Total samples accepted:         5980758682
     Total samples dropped:                0
   Statistics for the last second:
     Samples accepted:                     0
     Samples dropped:                      0
 
 NPC0(ar1.qpg2 vty)# show sample summary
 Sampling statistics summary:
   Max allowed samples per second:      3000
   Cumulative statistics:
   
 Total samples accepted:        5980758682  <-- No change in value
     Total samples dropped:                0
   Statistics for the last second:
     Samples accepted:                     0
     Samples dropped:                      0

 

Increase rlimit of sampled to rule out the possibility of memory issues:
WARNING: Please consult with a JTAC engineer before performing the steps below.

Determine the pid of sampled

root% ps ax | grep sample
4167  XX  SN     0:00.04 /usr/sbin/sampled -N


Make a note of its PID. Check its resource limit. Ideally, it would be 256MB out of 2048MB.

root% cat /proc/4167/rlimit | grep data
data 134217728 2147483648   <--
approximately 256M


Backup sampled to ensure that init does not respawn it once we kill it:

root% mv /usr/sbin/sampled /usr/sbin/sampled.bak

Kill the process sampled:

root% kill -9 4167

Increate data limit to 320M or use "% unlimit" to remove all resource limits:

root% limit datasize 320m

Move sampled.bak to sampled and restart it in the background

root% mv /usr/sbin/sampled.bak /usr/sbin/sampled;/usr/sbin/sampled -N &

Determine the new PID for sampled:

root% ps ax | grep sample
4189  p0  IN     0:00.04 /usr/sbin/sampled -N

Verify if the limits on sampled have increased. It should now be 320MB

root% cat /proc/4189/rlimit| grep data
data 268435456 2147483648
    <-- 320M


Now, check if the flow records are being correctly exported as expected. In case there are scripts running on the device, try disabling them and check if the issue resolves.
If the issue is is not resolved, please contact your JTAC Representative to troubleshoot further.
 

Related Links

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search