Support Support Downloads Knowledge Base Juniper Support Portal Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[EX/QFX] Output tail drops increment on interface although traffic rate is within interface capacity

1

0

Article ID: KB36095 KB Last Updated: 07 May 2021Version: 1.0
Summary:

It is a common problem for an interface to experience output tail drops when the output rate is well below the interface capacity. The most common causes are microbursts and head of line blocking.

This article explains what a microburst is, how it can be detected, and how to resolve or work around them.

Symptoms:

In the example below, output drops can be seen by using the following commands: show interface extensive or show interface queues when the output rate (in bps) is well below the link capacity.

root@switch> show interfaces extensive et-0/0/1   
Physical interface: et-0/0/1, Enabled, Physical link is Up
<snip>
  Link-level type: Ethernet, MTU: 9216, LAN-PHY mode, Speed: 100Gbps, BPDU Error: None, Loop Detect PDU Error: None, Ethernet-Switching Error: None,
<snip>
  Traffic statistics:
   Output bytes  :    12177007735137509           9555403168 bps

root@switch> show interfaces extensive et-0/0/0                
Physical interface: et-0/0/0, Enabled, Physical link is Up
<snip>
  Link-level type: Ethernet, MTU: 9216, LAN-PHY mode, Speed: 100Gbps, BPDU Error: None, Loop Detect PDU Error: None, Ethernet-Switching Error: None,
<snip>
  Traffic statistics:
   Output bytes  :    25659486464566573          12160456840 bps

As per the output above, port et-0/0/1 is a 100Gbps interface and the output rate is between 9Gbps to 12Gbps. However, the show interfaces queue output below shows that the drop rate for queue 0 is around 37Mbps to 40Mbps. 

root@switch> show interfaces queue | match "Phy|Total-dropped" | refresh 10    
---(refreshed at XX UTC)---
Physical interface: et-0/0/1, Enabled, Physical link is Up
  Description: PHY|100g
    Total-dropped packets:            5592376999                  3316 pps
    Total-dropped bytes  :         7594355981170              37745192 bps
    Total-dropped packets:                     0                     0 pps
    Total-dropped bytes  :                     0                     0 bps
    Total-dropped packets:                     0                     0 pps
    Total-dropped bytes  :                     0                     0 bps
    Total-dropped packets:                     0                     0 pps
    Total-dropped bytes  :                     0                     0 bps
    Total-dropped packets:                     0                     0 pps
    Total-dropped bytes  :                     0                     0 bps

---(refreshed at YY UTC)---
Physical interface: et-0/0/1, Enabled, Physical link is Up
  Description: PHY|100g 
    Total-dropped packets:            5592413981                  3515 pps
    Total-dropped bytes  :         7594406983451              40216200 bps
    Total-dropped packets:                     0                     0 pps
    Total-dropped bytes  :                     0                     0 bps
    Total-dropped packets:                     0                     0 pps
    Total-dropped bytes  :                     0                     0 bps
    Total-dropped packets:                     0                     0 pps
    Total-dropped bytes  :                     0                     0 bps
    Total-dropped packets:                     0                     0 pps
    Total-dropped bytes  :                     0                     0 bps
Cause:

The most common causes for these drops are microbursts and head of line blocking. In this article, we use a simplified topology to explain the difference between average rates, which are shown in the show interface output, and microburst rates.

Note: The data in the example is simplified for illustration purposes. Data from most real-life networks are more complicated than the example given here.

Assume the topology below.

+------------+
|    SW1    |
+------+-----+
       |
       | et-0/0/0
+------+-----+                              
|     SW3      |et-0/0/2                   +------------+
|                  +------------------------  |     SW4    |
+------+-----+                                 +------------+
       | et-0/0/1
       |
+------+-----+
|     SW2     |
+------------+

All frames use 1msec to transmit or receive. In other words, each interface can send or receive 1000 frames per second. Assume that the data flow is as follows:

Data flow 1

Switch 1---(et-0/0/0)Switch 3(et-0/0/2)---Switch 4

In every second, frames are sent out for the first 150msec. The interface rate of et-0/0/0 is 15% in the receive direction.

Data flow 2

Switch 2---(et-0/0/1)Switch 3(et-0/0/2)---Switch 4

In every second, frames are sent out for the first 100msec. The interface utilization of et-0/0/1 is 10% in the receive direction.

Assume that the size or maximum number of the egress buffer of et-0/0/2 is 50.

  • At time=1 msec, et-0/0/0 and et-0/0/1 on Switch 3 both receive one frame. These two frames are sent to the egress buffer of et-0/0/2 on Switch 3.

  • At time=2 msec, et-0/0/2 on Switch 3 sends out one frame to Switch 4. The number of the egress buffer of et-0/0/2 decreases by 1. At the same time, two frames are added to the egress buffer of et-0/0/2 on Switch 3 because Switch 3 receives one frame from et-0/0/0 and et-0/0/1 each. The net result is that the number of the egress buffer of et-0/0/2 increases by one.

  • Similarly, the number of the egress buffer of et-0/0/2 increases by one for every msec between 3 msec to 50 msec.

  • At time=51 msec, the number of the egress buffer of et-0/0/2 is 50, which is the maximum egress buffer number allocated to et-0/0/2.

  • Between time=52 msec to 100 msec, no more frames can be added to the egress buffer of et-0/0/2. et-0/0/2 then drops one frame per msec. The total number of tail drops is 49.

  • Between time=101 to 150 msec, Switch 3 receives only one frame from et-0/0/1. Thus, the egress buffer number of et-0/0/2 stays at 50.

  • Between time=151 to 200 msec, Switch 3 does not receive any frame from et-0/0/0 and et-0/0/1. But Switch 3 transmits one frame to Switch 4. The net result is that the egress buffer number of et-0/0/2 decreases by one in every msec.

In summary, the interface utilization of et-0/0/0 and et-0/0/1 in the receive direction is 10% and 15%, respectively. If there is no frame drop by Switch 3, the interface utilization of et-0/0/2 in the transmit direction is 25%. Because all frames are received at the beginning of the second (that is, microbursts), there are 49 tail drops, which reduces the interface utilization in the transmit direction to 21%, every second.

Solution:

Before we point out a possible solution for tail drops due to microbursts, some information about the different tools that can be used to detect microbursts are given as follows.

In the above example, the egress buffers build up during microbursts. One way to detect microbursts is to look at the egress buffer usage. The below section outlines three ways to identify microbursts on a port. 

  1. Port Buffer Monitoring

This feature polls egress buffer usage of each interface every second. The largest egress buffer usage is saved. To enable this feature, enable the following:

set chassis fpc 0 traffic-manager buffer-monitor-enable

To look at egress buffer usage, use the following command:

show interfaces queue buffer-occupancy <interface>

Below is a sample output:

root@switch> show interfaces queue buffer-occupancy et-0/0/4        
Physical interface: et-0/0/4, Enabled, Physical link is Up
  Interface index: 652, SNMP ifIndex: 540
  Description: 
Forwarding classes: 12 supported, 5 in use
Egress queues: 10 supported, 5 in use
            Queue: 0, Forwarding classes: best-effort
                Queue-depth bytes  :
                Peak               : 1970384
            Queue: 3, Forwarding classes: fcoe
                Queue-depth bytes  :
                Peak               : 0
            Queue: 4, Forwarding classes: no-loss
                Queue-depth bytes  :
                Peak               : 0
            Queue: 7, Forwarding classes: PREMIUM-DATA
                Queue-depth bytes  :
                Peak               : 832
            Queue: 8, Forwarding classes: mcast
                Queue-depth bytes  :
                Peak               : 11648
  1. Telemetry

This feature basically uses Port Buffer Monitoring and sends data to an external device. You need to specify q-mon in the sensor profile. For detailed information, refer to Configuring a Junos Telemetry Interface Sensor (CLI Procedure).

  1. Port Mirroring

With port mirroring, you capture packets on the port where the microburst is observed. After the capture is complete, open it by using the Wireshark application. You can check the statistics from the IO Graph by adjusting the X axis values to millisecond or microsecond and observe the peak on the Y axis. It will be obvious once you see the graph.

After the microbursts are detected, there are several techniques to minimize tail drops:

  1. Increase bandwidth on egress interfaces.

In the example above, the number of tail drops can be minimized if we replace et-0/0/2 with an AE interface.

  1. Use a deep buffer product.

In the example above, if we replace Switch 3 with another switch whose egress buffer maximum is 100, there will be no tail drop.

  1. Use traffic shaping to throttle traffic from the source.

In the example above, we can implement traffic shaping on Switch 1 and 2, so that Switch 1 and 2 can transmit only 1 frame per 2 msec each. There will not be any tail drop on Switch 3.

  1. Increase the egress buffer pool.

For QFX5000, egress buffers are classified into three partitions: lossless, lossy, and multicast. Most customers do not use the lossless partition, which is usually used by FCoE. If lossless partition is not used, allocate as much buffer as possible to the lossy partition.

  1. Configure a classifier and a scheduler.

Configure a classifier to put mission critical applications data to a queue other than the best effort queue. Then, allocate enough bandwidth and buffer to that queue. This effectively allows less drops on mission critical applications data at the expense of other application data that are using the best effort queue.

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search