Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[MX/Subscriber Management] L2TP LNS sessions fail to log in again after LAC-facing interface flaps

0

0

Article ID: KB29314 KB Last Updated: 23 Dec 2014Version: 1.0
Summary:

This article explains why Layer 2 Tunneling Protocol (L2TP) L2TP network server (LNS) sessions can fail to log back in after an L2TP access concentrator (LAC)-facing link flap occurs that lasts longer than the Point-to-Point Protocol (PPP) Keep Alive (KA) timeout typically found in PPP modems.

Symptoms:

The situation is highlighted in the following scenario:

  • An MX480 is configured for L2TP LNS.

  • The system is terminating ~6k tunnels and 24k L2TP sessions.

  • When an LAC-facing interface flaps for less than 30 seconds, the L2TP session drops due to PPP Keep Alive (KA) failures from the client.

  • PPP modems typically have a 10-second KA timer and a 30-second dead timer.

  • The tunnels stay up, as the hello timers are at 60 seconds (the default).

  • When this occurs, the PPP (L2TP session) client redials and generates ICRQ (incoming call requests). In this instance, about 4k tunnels or 8k sessions are affected.

  • When the LAC-facing link flap occurs for 30 seconds or more, all the PPP clients fail KA and start to redial. This behavior is correct, but the MX is not able to process all the ICRQ messages because the Kernel/jl2tpd process is overwhelmed.

  • The result: Sessions that go down cannot log back in. Some sessions may be able to log back in, but most, if not all, will not be able to log back in, as shown in the example below:

Lab@mx_lab-re1# run show services l2tp destination    
  Local Name    Remote IP        Tunnels       Sessions  State
  10            67.69.203.89     2000          8.0k      Enabled    
  11            67.69.121.169    2000          8.0k      Enabled    
  12            67.69.203.241    2000          8.0k      Enabled    
master[edit]
lab@mx_lab-re1# run show subscribers summary port    
Interface           Count 
si-1/0/0            8000               
si-1/1/0            8000               
si-2/0/0            4000               
si-2/1/0            4000       

Induce link flap for 30 seconds on 2 LAC-facing interfaces:

lab@mx_lab-re1# run show subscribers summary    

Subscribers by State
   Init: 441
   Active: 8000
   Terminating: 8885
   Terminated: 4995
   Total: 22321
MASTER[edit]
lab@mx_lab-re1# run show services l2tp destination                   
  Local Name    Remote IP        Tunnels       Sessions  State
  10            67.69.203.89     2000          558       Enabled    
  11            67.69.121.169    2000          8.0k      Enabled    
  12            67.69.203.241    2000          1757      Enabled    

As you can see, the tunnels stay up but the sessions drop as expected due to PPP KA timeout failures from the clients. The clients start to redial. This is when the clients will not be able to reconnect (stall).

Symptoms: High rate of subscribers in the init state. Active clients barely incrementing or not incrementing at all.

lab@mx_lab-re1# run show subscribers summary                 

Subscribers by State
   Init: 7925
   Active: 8000
   Total: 15925

Subscribers by Client Type
   L2TP: 15925
   Total: 15925

MASTER[edit]

No RADIUS access requests being sent to the RADIUS server:

ab@mx_lab-re1# run show network-access aaa statistics authentication 
Authentication module statistics
  Requests received: 49163
  Multistack requests: 0
  Accepts: 49163
  Rejects: 0
  Challenges: 0
  Requests timed out: 0

MASTER[edit]
lab@mx_lab-re1# run show network-access aaa statistics authentication    
Authentication module statistics
  Requests received: 49163
  Multistack requests: 0
  Accepts: 49163
  Rejects: 0
  Challenges: 0
  Requests timed out: 0

Tunnels could start dropping and sessions are fluctuating, but sessions never establish.

  MASTER[edit]
lab@mx_lab-re1# run show services l2tp destination 
  Local Name    Remote IP        Tunnels       Sessions  State
  10            67.69.203.89     1217          91        Enabled    
  11            67.69.121.169    2000          8.0k      Enabled    
  12            67.69.203.241    1948          5.1k      Enabled    
  13            67.69.121.209    0             0         Enabled    

MASTER[edit]
lab@mx_lab-re1# run show services l2tp destination    
  Local Name    Remote IP        Tunnels       Sessions  State
  10            67.69.203.89     1209          75        Enabled    
  11            67.69.121.169    2000          8.0k      Enabled    
  12            67.69.203.241    1949          5.1k      Enabled    
  13            67.69.121.209    0             0         Enabled    

MASTER[edit]
lab@mx_lab-re1# run show services l2tp destination    
  Local Name    Remote IP        Tunnels       Sessions  State
  10            67.69.203.89     1204          57        Enabled    
  11            67.69.121.169    2000          8.0k      Enabled    
  12            67.69.203.241    1950          5.0k      Enabled    
  13            67.69.121.209    0             0         Enabled

Routing Engine CPU is very high with jl2tp and Kernel consuming the most cycles.

Routing Engine status:
  Slot 1:
    Current state                  Master
    Election priority              Master
    Temperature                 30 degrees C / 86 degrees F
    CPU temperature             28 degrees C / 82 degrees F
    DRAM                      16352 MB (16384 MB installed)
    Memory utilization          18 percent
    CPU utilization:
      User                      58 percent
      Background                 0 percent
      Kernel                    33 percent
      Interrupt                  9 percent
      Idle                       0 percent
    Model                          RE-S-1800x4
    Serial ID                      9009063485
    Start time                     2014-06-26 13:49:33 PDT
    Uptime                         11 days, 18 hours, 9 minutes, 48 seconds
    Last reboot reason             Router rebooted after a normal shutdown.
    Load averages:                 1 minute   5 minute  15 minute
                                      13.05      14.67      13.52
									  
MASTER[edit]
lab@mx_lab-re1# run show system processes extensive 
last pid: 36752;  load averages: 13.61, 14.69, 13.55  up 11+18:10:42    07:59:46
152 processes: 19 running, 118 sleeping, 15 waiting

Mem: 1650M Active, 145M Inact, 312M Wired, 1283M Cache, 214M Buf, 12G Free
Swap: 8192M Total, 8192M Free

  PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
 2255 root        3  20    0   254M   191M sigwai  21:03 33.98% jl2tpd
 2264 root        1  99    0   121M 85544K select   9:05 10.21% dcd
 2263 root        1  99    0    98M 42604K RUN      3:10  5.57% cosd
   20 root        1 -68 -187     0K    16K WAIT    33:34  5.22% irq11: em0 em1 em*
 2254 root        3  20    0   282M   217M sigwai  14:37  4.15% jpppd
 2318 root        1 109    0   142M   109M RUN      6:20  2.59% mib2d
 2219 root        1 106    0 11776K  8132K RUN      1:15  2.05% rmopd
 2215 root        1   4    0   132M 80572K kqread   9:07  2.00% rpd
 1592 root        2   8  -88   117M 16148K nanslp 512:53  1.76% chassisd
 2250 root        1  99    0   243M   177M select  97:41  1.12% authd
 1613 root        1 103    0 74532K 67784K RUN      3:40  1.07% shm-rtsdbd
 2361 root        1 103    0   177M   115M RUN      5:57  1.03% dfwd
   12 root        1 -40 -159     0K    16K WAIT     6:26  0.93% swi2: netisr 0						  
Cause:

The underlying issue is that the Kernel and jl2tpd cannot keep up with the amount of ICRQ messages (incoming call requests) seeking to establish the L2TP sessions (PPP). The MX does not drop the ICRQ, but instead takes a long time to respond to the ICRQ with an ICRP (incoming call reply). The client thus times out and retries by the time the ICRP is sent. This behavior prevents sessions from logging back in.

Solution:

To help keep the Kernel and jl2tpd from being overwhelmed, turn on DDOS protection for L2TP.

By default, DDOS for L2TP is an aggregate of 20 kpps. This is too high. Changing the aggregated rate from 20 kpps to 250 pps will keep the Kernel/jl2tpd from being overwhelmed.

Setting the DDOS to 250 pps could have a side effect, however: DDOS for L2TP does not break out packet types and is only for the aggregate. Because of this, L2TP hello packets could be dropped for existing tunnels during these high rates of call setup, where L2TP is in violation and is dropping L2TP packets. Therefore, it is recommended that the tunnel-group hello interface be changed or set from the default of 60 seconds to 0 seconds. This change prevents the system from sending hellos, and allows the system to respond only to hellos from the LACs.

Example: DDOS setting for L2TP

lab@mx_lab-re1# show system ddos-protection protocols l2tp 
aggregate {
    bandwidth 250;
    burst 250;
    recover-time 90;
}

L2TP tunnel hello-interface set to 0

tunnel-group tg1-N00009-1-1-1 {
    l2tp-access-profile bell-l2tp-access-profile;
    aaa-access-profile aaa-profile;
    hello-interval 0;
    local-gateway {
        address 67.69.121.170;
    }
    service-device-pool slot1-tunnel-pool;
    dynamic-profile dynamic-profile-1;

lab@mx_lab-re1# run show ddos-protection protocols l2tp aggregate                          
Currently tracked flows: 0, Total detected flows: 0
* = User configured value

Protocol Group: L2TP

  Packet type: aggregate (Aggregate for all L2TP traffic)
    Aggregate policer configuration:
      Bandwidth:        250 pps*
      Burst:            250 packets*
      Recover time:     90 seconds*
      Enabled:          Yes
    Flow detection configuration:
      Detection mode: Automatic  Detect time:  3 seconds
      Log flows:      Yes        Recover time: 60 seconds
      Timeout flows:  No         Timeout time: 300 seconds
      Flow aggregation level configuration:
        Aggregation level   Detection mode  Control mode  Flow rate
        Subscriber          Automatic       Drop          10 pps
        Logical interface   Automatic       Drop          10 pps
        Physical interface  Automatic       Drop          20000 pps
    System-wide information:
      Aggregate bandwidth is being violated!
        No. of FPCs currently receiving excess traffic: 1
        No. of FPCs that have received excess traffic:  1
        Violation first detected at: 2014-07-08 09:14:41 PDT
        Violation last seen at:      2014-07-08 09:18:12 PDT
        Duration of violation: 00:03:31 Number of violations: 2
      Received:  6973954             Arrival rate:     617 pps
      Dropped:   91100               Max arrival rate: 1729 pps
    Routing Engine information:
      Bandwidth: 250 pps, Burst: 250 packets, enabled
      Aggregate policer is no longer being violated
        Last violation started at: 2014-07-08 07:08:52 PDT
        Last violation ended at:   2014-07-08 07:08:55 PDT
        Duration of last violation: 00:00:03 Number of violations: 1
      Received:  6883414             Arrival rate:     250 pps
      Dropped:   197                 Max arrival rate: 2811 pps
        Dropped by individual policers: 0
        Dropped by aggregate policer:   197
    FPC slot 1 information:
      Bandwidth: 100% (250 pps), Burst: 100% (250 packets), enabled
      Aggregate policer is never violated
      Received:  1153414             Arrival rate:     0 pps
      Dropped:   0                   Max arrival rate: 1133 pps
        Dropped by individual policers: 0
        Dropped by flow suppression:    0
    FPC slot 2 information:
      Bandwidth: 100% (250 pps), Burst: 100% (250 packets), enabled
      Aggregate policer is currently being violated!
        Violation first detected at: 2014-07-08 09:14:41 PDT
        Violation last seen at:      2014-07-08 09:18:12 PDT
        Duration of violation: 00:03:31 Number of violations: 1
      Received:  5820540             Arrival rate:     617 pps
      Dropped:   90903               Max arrival rate: 1729 pps
        Dropped by individual policers: 0
        Dropped by aggregate policer:   90903
        Dropped by flow suppression:    0
      Flow counts:
        Aggregation level     Current       Total detected   State
        Subscriber            0             0                Active
        Logical-interface     0             0                Active
        Physical-interface    0             0                Active
    FPC slot 3 information:
      Bandwidth: 100% (250 pps), Burst: 100% (250 packets), enabled
      Aggregate policer is never violated
      Received:  0                   Arrival rate:     0 pps
      Dropped:   0                   Max arrival rate: 0 pps
        Dropped by individual policers: 0
        Dropped by flow suppression:    0

MASTER[edit]
lab@mx_lab-re1# 

With DDOS settings in place, subscribers L2TP will go into violation state, which is good. This will allow the Kernel/jl2tp to keep up and subscribers can log in successfully.

MASTER[edit]
lab@mx_lab-re1# run show subscribers summary 

Subscribers by State
   Init: 3154
   Configured: 39
   Active: 11116
   Total: 14309

Subscribers by Client Type
   L2TP: 14309
   Total: 14309

MASTER[edit]
lab@mx_lab-re1# run show subscribers summary    

Subscribers by State
   Init: 3071
   Configured: 110
   Active: 11146
   Total: 14327

Subscribers by Client Type
   L2TP: 14327
   Total: 14327

MASTER[edit]
lab@mx_lab-re1# run show subscribers summary    

Subscribers by State
   Init: 3032
   Configured: 118
   Active: 11186
   Total: 14336

Subscribers by Client Type
   L2TP: 14336
   Total: 14336

MASTER[edit]

Related Links

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search