Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[STRM] Dropped event messages

0

0

Article ID: KB13554 KB Last Updated: 27 May 2014Version: 6.0
Summary:

STRM drops events and flows.

Symptoms:

The STRM Dashboard shows events dropped, limit reached.

Cause:

Solution:

The STRM Event pipeline has multiple levels of data processing. At each of these levels, it is possible for events-processing to become backlogged. When this occurs, the system can buffer messages for a short period of time, but if the buffers become full, events are dropped. Below are a few examples of messages that show up in the System Notification Dashboard screen, with descriptions of the possible causes of the dropped events.

Note: These Log messages are in /var/log/qradar.error of Console and Managed EP's.


Dropped events message, queue at 0%

Sample message

Feb 11 06:17:21 127.0.0.1 [ecs] [[type=com.eventgnosis.system.ThreadedEventProcessor][parent=qradar.domain.com:ecs0/EC/Processor1/DSM_Normalize]] com.q1labs.semsources.filters.DSMFilter: [WARN] [NOT:0060005100][192.168.2.3/- -] [-/- -] Device Parsing has detected a total of 140407 dropped event(s). 8618 event(s) dropped in the last 900 seconds. Queue is at 0 percent capacity.

Details

This reports the number of dropped messages over a 15-minute time period. Note that the queue is at 0%. This usually indicates that during the last reporting period, there was at least 1 event-rate spike that caused the queues to fill to the point that the processing threads could not keep up with the input queues. A spike in the number of events could be caused by several types of network events that could cause a large number of events to be generated. Typically, this occurs only a few times a day, impacting just a few instances in which events will not be processed and saved. However, if this consistently occurs, check your Event-limit licensing. You might have to get additional eps licenses


Dropped events message, queue at > 0%

Sample message

Feb 11 06:17:21 127.0.0.1 [ecs] [[type=com.eventgnosis.system.ThreadedEventProcessor][parent=qradar.domain.com:ecs0/EC/Processor1/DSM_Normalize]] com.q1labs.semsources.filters.DSMFilter: [WARN] [NOT:0060005100][192.168.5.66/- -] [-/- -]Device Parsing has detected a total of 12304037 dropped event(s). 61658 event(s) dropped in the last 900 seconds. Queue is at 98 percent capacity.

Details

Similar to the message above, this indicates the number of events dropped in the last 15 minutes. The difference to note in this message, however, is that your event queue remains at a high capacity rate. When this occurs, it means that the STRM event pipeline is constantly under load, and the possibility of dropped events is much higher. If STRM systems are running in this state and these messages appear repeatedly during the day, the cause could be as follows.

Too high an event rate for your system: Most event collectors are rated for up to 5000 events per second. If you are constantly over this event rate, you should consider additional event processing capacity with an additional event collector/processor.
Inefficient sensor device extension: If your event rate is lower than the capacity of your system, perhaps around 2000 eps, and you are using a DSM extension, it is possible that the extension you are using is inefficient in its RegEx pattern. Inefficient patterns can cause the processing rate of your system to drop--to the point that the event processing rate can fall from 5000 events per second to as low as 1000 events per second. If DSM extensions are being used, disable them for a period of time to determine the impact on your dropped events.


Dropped events, event throttle – license key

Sample message

Feb 15 17:02:01 127.0.0.1 [ecs] [9f9d4ab9-b466-495f-9df0-4fb751d6c3e9/SequentialEventDispatcher] com.q1labs.semsources.filters.EventThrottleFilter: [WARN] Events per interval threshold was exceeded 96 percent of the time over the past hour

Details

This message indicates that your system is running at or near the event processing license rate in your STRM licenses. When your system reaches its license capacity, it will also begin to drop events in the pipeline. If this is occurring, you should contact your sales representative to discuss an upgrade to the event rating in your licenses.


Performance Degradation: CRE engine has sent (x) event directly to storage. Queue is a 0%

Sample message

Aug 29 15:48:07 88.82.2.205 [ecs] [aa4c44eb-94d9-46b3-9a2a-661198341e5e/SequentialEventDispatcher] com.q1labs.sem.monitors.SourceMonitor: [INFO] [NOT:0000006000][88.82.2.205/- -] [-/- -]Incoming raw event rate (5s: 1177.00 eps), (10s: 1110.50 eps), (15s: 1178.80 eps), (30s: 1199.03 eps), (60s: 1168.30 eps), (300s: 1220.48 eps), (900s: 1220.48 eps). Peak in the last 60s: 1346.00 eps. Max Seen 7387.60 eps. EC Throttles/5s (60s: 0.33). Total EC Throttles in the last 60s: 4. Total EC Throttles: 237612.

Details

This log is an indication that something is preventing or slowing progress in the CRE. It has nothing to do with the license limit. When you see this log, it means that the CRE couldn't keep up with the event rate, so it had to write events directly to ariel to avoid dropping them. These events are searchable. The queue in this case is the CRE event queue; it is not the license limit queue. If you see this log, you should run the command below to determine what is slowing down the CRE:

/opt/qradar/support/findExpensiveCustomRules.sh

(For information on events dropped in the CRE, see KB19181.)

Max Seen: The time frame is the time since the event pipeline process was started.

Throttle: The term throttle means that in the last 2s you have exceeded your eps license limit and events have been put into the overflow queue. So, for example, if your license limit is 10000 eps and in the last 2s you received 25000 events, 5000 events are put into the overflow queue. This counts as 1 throttle. Let's look at the example:

EC Throttles/5s (60s: 0.33). Total EC Throttles in the last 60s: 4. Total EC Throttles: 237612.

EC Throttles/5s: This can best be described as throttling rate; throttles per 5s, or TP5. So, in this example, the system has a TP5 rate of 0.33, which means that in the last 60s the system is being throttled an average of 0.33 times every 5s. This correlates to the previous item. Because the throttling rate is on a 5s interval, there are only 12 intervals per minute. In the last 60s there were only 4 throttles; so, the average number of TP5 = 4/12 = 0.33.

Total EC Throttles in the last 60s: This is the count of the number of throttles in the last 60s.

Total EC Throttles: This is a count of how many times events were throttled since the event pipeline process was started.

As long as you are not constantly throttling, you won't drop events. The excess events are put into the overflow queue and are processed after your eps drops back down below your license limit. However, if you are constantly throttling, the overflow queue will fill up and the system will start dropping events. When events are dropped, you will be notified by the log entries, by events injected into the pipeline, and by system notifications. The correlating log entry would look like the following:

Sep 6 14:39:33 10.100.129.130 [ecs] [28237abe-b729-48ce-8628-fbf93e9183af/SequentialEventDispatcher] com.q1labs.sem.monitors.SourceMonitor: [WARN] [NOT:0000004000][10.100.129.130/- -] [-/- -][SyslogSource] has detected a total of 11447 dropped event(s). 9898 event(s) were dropped in the last 60 seconds. Queue is at 100 percent capacity


Notes

  • Also check KB14413 to troubleshoot Event drop messages.
  • For more details on how the STRM calculates events per seconds, see KB21340.
Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search