Search our Knowledge Base sites to find answers to your questions.
Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles[Subscriber-Management] MX Virtual Chassis Best Practices
This article aims to provide guidelines, known behaviors, and best practices related to the use of the MX Virtual Chassis (MX-VC). Not that this article assumes a minimum JunoS release of 15.1 as some of the configuration parameters recommended were not in place in older software releases.
The following recommendations pertain to MX-VC deployments:
Routing Engine: Routing Engines (REs) that are installed on two systems on the same platform (MX and MX vs MX and MX2020) should have the same type with the same amount of memory. REs with 32 GB memory will have a higher scaling capacity and will use 64-bit daemons such as the Subscriber Management Daemon (SMGD). If member0 has 32 GB REs and member1 has 16 GB REs of the same type, SMGD will not be able to sync. The SMGD on the 16 GB member will be running in 32-bit mode and the 32 GB RE member will be running in 64-bit mode.
FPC/MPC Support: It is possible that newer Flexible PIC Concentrators (FPCs) and Modular Port Concentrators (MPCs) may not be supported in the release that is being used. To be sure that the hardware is compatible with the code, refer to Protocols and Applications Supported on MPCs for MX Series Routers.
Virtual-Chassis VCP location: Virtual-Chassis Control Port (VCP) links should have their own dedicated FPC. This is to avoid control packet congestion or contention on the FPC that is hosting the VC control channel.
Recommended VCP links: In Junos OS releases prior to 17.2, it is recommended that the number of VCP links should be a factor of 2 (2, 4, 8, or 16). This is due to the hash algorithm that is used to send traffic over the links between members being limited to that factor. If an odd number is used, traffic will not be spread equally. In JunoS release 17.2 or later, RLI31252 removes this requirement because the algorithm has changed. With these later releases, an odd number of links such as 3, 5, 7, or 9 can be used and the traffic would still be distributed as expected.
Adding or Removing VCP links: When adding a new VCP link, conversion from a ge/xe/et
interface to a VCP port takes place. It is recommended to wait until the process is complete for a single link before adding more links. Do not try to add two links at the same time because it can cause the last link processed to be in a permanently down state. If a link does get into this condition, the recovery procedure is to remove and add the VCP link again. When adding or removing multiple VCP links, the recommendation is to wait ~10 seconds before adding or removing the next link.
Switchover Behavior: Starting with JunoS release 15.1, when a Virtual Chassis Graceful Routing-Engine Switchover (GRES) event occurs between two Routing Engines, the former Master RE reloads automatically before transitioning to the Standby. This reload is desired because it clears state and avoids complicated Master-to-Standby transition logic issues.
GRES Readiness: To confirm whether the VC system is ready for a Virtual Chassis GRES switchover, run the 'request virtual-chassis' routing-engine master switch check
command. If the system is not ready, it will display the reason.
Virtual-Chassis reload: To reload all the four REs in a Virtual Chassis, use the 'request system reboot all-member both-routing-engines'
command. This will reload all the REs in both systems.
Unified In-Service Software Upgrade (ISSU) in Virtual-Chassis: Before starting an ISSU, be aware of the Link Aggregation Control Protocol (LACP) and Bidirectional Forwarding Detection (BFD) behavior changes that occur during an ISSU upgrade. For LACP, the system will automatically change from fast intervals to slow during the ISSU. This means that the connecting device will need to be changed to slow before the ISSU is initiated. BFD timers will automatically increase as well, but BFD will handle updating the peering devices automatically. No user intervention is needed. See Preparing for a Unified ISSU in an MX Series Virtual Chassis for steps to prepare a VC for an ISSU upgrade.
Routing Engine Replacement: If an RE fails and needs to be replaced, the default configuration for an RMAd RE from the factory has VC disabled on it. You must follow these instructions for replacing an RE in order for it to come up in a Virtual Chassis: https://www.juniper.net/documentation/en_US/junos/topics/example/virtual-chassis-mx-series-replacing-routing-engine.html
Customers with high scale environments running Enhanced Subscriber Management, with or without Virtual Chassis, typically need to make changes to their systems on a regular basis. In many Subscriber Management systems, some aspect of dynamic interface creation is used to terminate subscribers. Interfaces such as et/xe/ge/si/psx/lt
may have any combination of subscribers such as DHCPv4/v6, PPPoE, L2TP, and so on. In most cases, an additional layer is used such as VLAN Demux and IP Demux. Some physical interface or IFD configuration changes can have an impact on existing subscribers as well, and are not VC-specific. This means that changing the MTU or the Hierarchical Scheduler (HS) of an existing IFD will cause that IFD to reprogram. This results in brief loss of traffic or in subscribers moving to a terminating state.
For Virtual Chassis, client-facing interfaces are typically Aggregate Ethernet (AE) bundles. Making changes to multiple AE interfaces at the same time is not recommended on scaled systems. These changes should work fine, but the risk of unforeseen issues does increase. Consider the following examples that can be problematic with scaled subscribers on an MX-VC with AE-terminated subscribers:
Link Flap: AE1 has two legs that consist of ge-0/0/1 (member0) and ge-12/0/1 (member1). All subscriber-related information is mirrored on each FPC, which includes the subscriber Variable Based Flow (VBF), VLAN, CoS, Firewall Filters, and so on. These AE interfaces can scale to thousands of VLAN/Subscriber sessions, making a physical change that causes a flap to create a large amount of churn and work for the system. Although all the changes are updated and processed, there could be an impact with traffic loss and potential subscriber loss if the duration of the impact exceeds keep-alive timers.
Class of Service (CoS) Change: Another example that can be problematic is when adding or removing the CoS configuration for the Hierarchical Scheduler or changing the Maximum Transmission Unit (MTU) of an AE bundle. Both actions result in a reprogramming of the interfaces that are built on top of the AE. The amount of work that the system needs to do to process the changes can result in subscribers getting lost. The number of subscribers connected to the AE has a direct impact on whether or not a problem may be seen. When subscribers start dropping off, the system will get even busier as it starts to clean up interface state, routes, and database entries for each subscriber. If multiple changes to an AE are made in a single commit, the likelihood of problems increases.
Recommendation: Adding or removing legs to an existing bundle is another potential problem because it can result in reprogramming actions and updates to forwarding that generate a large amount of work for the system. When changing an AE composition, it is recommended to wait at least a minute between changes if multiple updates are being made. This will allow time to clean up sessions on that FPC before adding the leg back into the bundle.
Logical Tunnel (LT) Anchors: Pseudowire (PS) interfaces that terminate subscribers from an MPLS/L2 environment need an anchor interface. The anchor point could have thousands of VLANs and associated subscribers. When making changes to the underlying LT interface that is tied to the PS, or changing the location of an LT interface, there will be an impact on the system. This can result in traffic loss and subscriber churn.
Recommendation: If more than a single LT anchor needs to be updated to a new LT location, it is recommended to carry out the changes during a maintenance window because an impact is expected. Performing changes one at a time and confirming that subscribers recover before moving on to the next one is ideal. This will reduce the chances of running into unforeseen issues.
For best practices and performance related recommendations for MX devices running Junos OS releases prior to 15.1, see:
KB29590 - [Subscriber Management] Maximizing Scaling and Performance for MX Series Virtual Chassis.
Getting Up and Running with Junos
Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search