Support Support Downloads Knowledge Base Juniper Support Portal Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[QFX] EVPN-VXLAN fabric - Maintenance mode procedure for hitless upgrade

0

0

Article ID: KB36427 KB Last Updated: 03 Feb 2021Version: 1.0
Summary:

This article describes a generic maintenance mode procedure with an outcome of a hitless or minimal loss. Based on the current BGP utilization in the IP Fabrics, the traffic drain procedure shared below is giving predictable outcomes for the overall fabric stability. 

Main steps of the procedure: 

  • Check the health of the given leaf device: no-bgp flaps, consistent evpn route count, no-core-dumps, no interface flaps, no pfe drops
  • Divert downstream traffic to ESI LAG to adjacent leaf
  • Divert upstream traffic from storage ESI LAG to adjacent leaf
  • Divert upstream traffic from all remaining ESI LAG to adjacent leaf
  • Divert traffic of stand-alone interfaces to adjacent leaf
  • Close vtep
  • Rescue config
Solution:

Below is the generic procedure that can be used for maintenance-mode at the QFX devices: 

++Entering maintenance mode++

  1. Step-nr1: 

    1.0. Deactivate the maintenance import policy-statement at the leafs or spines sharing the same ESIs with the leaf or spine that is going to enter in the MM. (Check on every leaf/spine, whether it shares any ESI with the node that will be entering the MM)

    1.1. Set export MM policy with the BGP community for /32 and LPM prefixes except for the dummy static T5 route to maintain the tunnel in the up state (20.2r2 onward supported - in edge routed with Type-5 VRF enabled) 

    1.2. Set export MM policy with the BGP community for T1 routes to maintain the tunnel in the up state
    >> commit nr1
    (North to south traffic to MH’ed server should move to the other leaf connected to the same segment)
    <wait 6 seconds>
  2. Step-nr2:

    2.1. Set interfaces $pe_ce_links disable at the leaf/spine

    >> commit nr2
    (south to north traffic from MH’ed server should move to the other leaf/spine)
    [*** node is in maintenance mode  *** ]
    ++ Exiting the maintenance mode ++
  3. Step-nr3: Deactivate the MM export policy statements from step-1

    >> commit nr3
    <wait 60 seconds> 
    (Wait until all i/e BGP sessions gets established and EVPN routes learned from the rest of the fabric)
  4. Step-nr4: Delete interfaces $pe_ce_links disable at the leaf/spine

    >>commit nr4
    Optional new points to consider after the step-4:
    - run the EVPN KPI HB playbook and show the results in CEM UI(EVPN KPIs were shared with
  5. Step-5: undo step-1.0

    Example configurations and preparation for maintenance-mode:

    5.1. Grouping the interfaces at the given node based on their single-homing or multihoming capabilities # 

    In order to group interfaces having the same connectivity characteristics - single-homed or multihomed it's recommended to use the groups concept available in Junos 
    For example, the following groups for interfaces can be defined: 
    set groups GROUP-IF-CFG interfaces <*> disable
    set groups GROUP-STAT-IF interfaces xe-0/0/8 apply-groups GROUP-IF-CFG
    <…>
    set groups GROUP-STRG-IF interfaces xe-0/0/13 apply-groups GROUP-IF-CFG
    <…>
    set groups GROUP-LACP-IF interfaces xe-0/0/21 apply-groups GROUP-IF-CFG
    <…>
    Make sure the interfaces and member LAG interfaces are enabled with the hold-time up.
    Example: 
    set interfaces xe-0/0/13 hold-time up 120000 down 0

    5.2. Enabling globally the policy statement to drain traffic from leaf: 

    set protocols bgp group IPFAB-OVERLAY vpn-apply-export
    set policy-options policy-statement REJECT then reject
    set policy-options policy-statement REJECT-T1 term term1 from family evpn
    set policy-options policy-statement REJECT-T1 term term1 from nlri-route-type 1
    set policy-options policy-statement REJECT-T1 term term1 then reject
    set policy-options policy-statement REJECT-T1 then accept
    commit comment "Preparation"

    5.3. Drain for leaf when LAG/Bonding interface is used (ESI-LAG multihoming)  

    set protocols bgp group IPFAB-OVERLAY export REJECT-T1
    commit comment "BGP export REJECT-T1"
    
    set interfaces apply-groups GROUP-STRG-IF
    set interfaces apply-groups GROUP-LACP-IF
    commit comment "LACP interfaces down"

    5.4. Bringing down the VTEP

    delete protocols bgp group IPFAB-OVERLAY export
    set protocols bgp group IPFAB-OVERLAY export REJECT
    commit comment "BGP export REJECT"

    5.5. Save rescue configs 

    run request system configuration rescue delete
    run request system configuration rescue save
    run show system commit
    Then perform the operation on the given node  ##

    5.6. Undrain for the fabric leaf

    delete protocols bgp group IPFAB-OVERLAY export
    commit comment "BGP export"
    # forerunner
    set interfaces <forerunner interface> apply-groups-except GROUP-STAT-IF
    commit comment "First Static Interface Up"
    # bond STRG Up: before proceeding verify that remote vtep are up
    delete interfaces apply-groups GROUP-STRG-IF
    commit comment "bond Storage Up"
    # bond LACP Up
    delete interfaces apply-groups GROUP-LACP-IF
    commit comment "LACP interfaces up"
    # bond STATIC Up
    delete interfaces apply-groups GROUP-STAT-IF
    commit comment "Static Interface Up”
    # Clean-up
    delete interfaces <forerunner interface> apply-groups-except GROUP-STAT-IF
    commit comment "Clean-up Undrain completed"

Note: Regarding the maintenance-mode procedure, it is possible that there is a certain lag between the local interface reaching the collecting-distributing state and the establishment of VTEP on a remote leaf. If the local server sends traffic right away to this interface, some initial packets may be dropped by a remote leaf.

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search