Knowledge Search


×
 

[QFabric] QFX3000-M and QFX3600-I IC replacement impact analysis

  [KB34940] Show Article Properties


Summary:
This article discusses the impact of replacing a QFX3600-I InterConnect (IC) device on a QFX3000-M QFabric system.

The tests described in this article were carried out in a lab environment using a IXIA traffic generator. A lab environment is generally less complex than a production environment. This article aims to quantify the amount of packet loss, but will not look into how services might be impacted. Service impact depends on the design, the protocols used, and how they handle and recover from packet loss.

Note: The impact of an IC crash or replacement on a QFabric QFX3000-G is different.

Symptoms:

A possible symptom is that a fan fails, and even after replacing it, the new fan also does not work. After troubleshooting, it is determined that the slot is faulty and not the fan.

Cause:

The QFX3600-I chassis might need to be replaced or rebooted following troubleshooting or RMA.

Solution:

Impact of IC Replacement by Different Scenarios

Scenario 0 - power-off the IC:

The recommended procedure can be found in the following technical document:
Adding or Replacing an Interconnect Device in a QFX3000-M QFabric System

Additionally, tests were conducted in the lab pushing 20Gbps through the IC with minimal packet loss. Powering off the device or halting the device exhibit the same results. In the lab, traffic is gracefully failed over to the other IC device within 10 to 15 seconds, observing no packet loss.

Note: This was a lab with a clean install. You should expect and plan for some packet drops in a live working scenario.

The following commands were used on the IC, which exhibit the same results:

  • request system power-off in 0
  • request system halt in 0
  • request system reboot in 0‚Äč

Scenario 1 - Disable the interfaces on the node devices 1 by 1

This may complicate and/or extend your maintenance window.
Once the IC chassis is replaced, you need to go back to each node device and enable the interfaces 1 by 1.

Scenario 2 - Disable the interfaces on the IC device 1 by 1

This may extend your maintenance window.
You need to log into the IC device as root and then to the CLI to be able to access the configuration from outside the DG.
This is not the recommended method of replacing the IC device.
If you do this, you would have several moments where traffic is impacted (1 per interface during each interface failover
Scenario 3 - Disable all the fte interfaces on the IC side.

This would be comparable to powering off the device, but can give you a quick way to restore traffic back in case the service impact is greater than expected.

Impact of IC Replacement by Disabling Interfaces

  • Frame drops are observed for L2 unicast/multicast traffic when the fte* interfaces from the interconnect device are disabled.
  • These frame drops stop when the traffic is failed over to the remaining interconnect device.
  • No frame drops are observed when the interfaces are enabled again, placing traffic back on the IC device.
  • Total frame drops for a given frame size is inversely proportional to the frame size.
  • The percentage of dropped frames during failover as compared to the total frames transmitted in a 60s interval is constant, regardless of frame size. Which suggests that the number of dropped frames is a function of frame size.
  • By measuring the total frame count transmitted in a 60s interval with a constant load, one can calculate the amount of frames per second going through the IC device.
  • Failover time is calculated to be around 14ms, based on the fps number and the total Dropped Frame Count.

One can observe these amounts of drops during the moment traffic is failed over, under the following conditions:

  • ≈6s commit time (± 1s error) on the IC device
  • Constant load of 5 Gbps Tx and 5 Gbps Rx going through the IC device (aggregated 10G of traffic)
  • A Total Frame Count span of ≈60s (± 1s error) , during which all fte* interfaces in the IC device are disabled at the ≈30s mark (± 1s error)
 
Frame Size (bytes) Dropped frame Count Total Frame Count
(in 60s)
% Dropped Packet
(in 60s)
Frames per Second (fps) Failover Time
(in ms)
128 124363 520366820 0,0238991 8672780,33 14,33946154
300 66438 240791244 0,02759153 4013187,4 16,55492091
500 31668 147839915 0,02142047 2463998,58 12,85228012
1024 18545 74386178 0,02493071 1239769,63 14,95842413
1518 12139 50165216 0,02419804 836086,933 14,51882516
4000 4786 19271480 0,02483463 321191,333 14,90077565
9000 1934 8529814 0,02267341 142163,567 13,60404811

% Dropped Packets (in 60s) = Dropped Frame Count x 100% ÷ Total Frame Count (in 60s)
fps = Total Frame Count (in 60s) ÷ 60s
Failover Time = Dropped Frame Count x 1000ms ÷ fps

 
Related Links: