Search our Knowledge Base sites to find answers to your questions.
Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles[MX/Subscriber Management] L2TP LNS sessions fail to log in again after LAC-facing interface flaps
This article explains why Layer 2 Tunneling Protocol (L2TP) L2TP network server (LNS) sessions can fail to log back in after an L2TP access concentrator (LAC)-facing link flap occurs that lasts longer than the Point-to-Point Protocol (PPP) Keep Alive (KA) timeout typically found in PPP modems.
The situation is highlighted in the following scenario:
An MX480 is configured for L2TP LNS.
The system is terminating ~6k tunnels and 24k L2TP sessions.
When an LAC-facing interface flaps for less than 30 seconds, the L2TP session drops due to PPP Keep Alive (KA) failures from the client.
PPP modems typically have a 10-second KA timer and a 30-second dead timer.
The tunnels stay up, as the hello timers are at 60 seconds (the default).
When this occurs, the PPP (L2TP session) client redials and generates ICRQ (incoming call requests). In this instance, about 4k tunnels or 8k sessions are affected.
When the LAC-facing link flap occurs for 30 seconds or more, all the PPP clients fail KA and start to redial. This behavior is correct, but the MX is not able to process all the ICRQ messages because the Kernel/jl2tpd process is overwhelmed.
The result: Sessions that go down cannot log back in. Some sessions may be able to log back in, but most, if not all, will not be able to log back in, as shown in the example below:
Lab@mx_lab-re1# run show services l2tp destination Local Name Remote IP Tunnels Sessions State 10 67.69.203.89 2000 8.0k Enabled 11 67.69.121.169 2000 8.0k Enabled 12 67.69.203.241 2000 8.0k Enabled master[edit] lab@mx_lab-re1# run show subscribers summary port Interface Count si-1/0/0 8000 si-1/1/0 8000 si-2/0/0 4000 si-2/1/0 4000
Induce link flap for 30 seconds on 2 LAC-facing interfaces:
lab@mx_lab-re1# run show subscribers summary Subscribers by State Init: 441 Active: 8000 Terminating: 8885 Terminated: 4995 Total: 22321 MASTER[edit] lab@mx_lab-re1# run show services l2tp destination Local Name Remote IP Tunnels Sessions State 10 67.69.203.89 2000 558 Enabled 11 67.69.121.169 2000 8.0k Enabled 12 67.69.203.241 2000 1757 Enabled
As you can see, the tunnels stay up but the sessions drop as expected due to PPP KA timeout failures from the clients. The clients start to redial. This is when the clients will not be able to reconnect (stall).
Symptoms: High rate of subscribers in the init state. Active clients barely incrementing or not incrementing at all.
lab@mx_lab-re1# run show subscribers summary Subscribers by State Init: 7925 Active: 8000 Total: 15925 Subscribers by Client Type L2TP: 15925 Total: 15925 MASTER[edit]
No RADIUS access requests being sent to the RADIUS server:
ab@mx_lab-re1# run show network-access aaa statistics authentication Authentication module statistics Requests received: 49163 Multistack requests: 0 Accepts: 49163 Rejects: 0 Challenges: 0 Requests timed out: 0 MASTER[edit] lab@mx_lab-re1# run show network-access aaa statistics authentication Authentication module statistics Requests received: 49163 Multistack requests: 0 Accepts: 49163 Rejects: 0 Challenges: 0 Requests timed out: 0
Tunnels could start dropping and sessions are fluctuating, but sessions never establish.
MASTER[edit] lab@mx_lab-re1# run show services l2tp destination Local Name Remote IP Tunnels Sessions State 10 67.69.203.89 1217 91 Enabled 11 67.69.121.169 2000 8.0k Enabled 12 67.69.203.241 1948 5.1k Enabled 13 67.69.121.209 0 0 Enabled MASTER[edit] lab@mx_lab-re1# run show services l2tp destination Local Name Remote IP Tunnels Sessions State 10 67.69.203.89 1209 75 Enabled 11 67.69.121.169 2000 8.0k Enabled 12 67.69.203.241 1949 5.1k Enabled 13 67.69.121.209 0 0 Enabled MASTER[edit] lab@mx_lab-re1# run show services l2tp destination Local Name Remote IP Tunnels Sessions State 10 67.69.203.89 1204 57 Enabled 11 67.69.121.169 2000 8.0k Enabled 12 67.69.203.241 1950 5.0k Enabled 13 67.69.121.209 0 0 Enabled
Routing Engine CPU is very high with jl2tp and Kernel consuming the most cycles.
Routing Engine status: Slot 1: Current state Master Election priority Master Temperature 30 degrees C / 86 degrees F CPU temperature 28 degrees C / 82 degrees F DRAM 16352 MB (16384 MB installed) Memory utilization 18 percent CPU utilization: User 58 percent Background 0 percent Kernel 33 percent Interrupt 9 percent Idle 0 percent Model RE-S-1800x4 Serial ID 9009063485 Start time 2014-06-26 13:49:33 PDT Uptime 11 days, 18 hours, 9 minutes, 48 seconds Last reboot reason Router rebooted after a normal shutdown. Load averages: 1 minute 5 minute 15 minute 13.05 14.67 13.52 MASTER[edit] lab@mx_lab-re1# run show system processes extensive last pid: 36752; load averages: 13.61, 14.69, 13.55 up 11+18:10:42 07:59:46 152 processes: 19 running, 118 sleeping, 15 waiting Mem: 1650M Active, 145M Inact, 312M Wired, 1283M Cache, 214M Buf, 12G Free Swap: 8192M Total, 8192M Free PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND 2255 root 3 20 0 254M 191M sigwai 21:03 33.98% jl2tpd 2264 root 1 99 0 121M 85544K select 9:05 10.21% dcd 2263 root 1 99 0 98M 42604K RUN 3:10 5.57% cosd 20 root 1 -68 -187 0K 16K WAIT 33:34 5.22% irq11: em0 em1 em* 2254 root 3 20 0 282M 217M sigwai 14:37 4.15% jpppd 2318 root 1 109 0 142M 109M RUN 6:20 2.59% mib2d 2219 root 1 106 0 11776K 8132K RUN 1:15 2.05% rmopd 2215 root 1 4 0 132M 80572K kqread 9:07 2.00% rpd 1592 root 2 8 -88 117M 16148K nanslp 512:53 1.76% chassisd 2250 root 1 99 0 243M 177M select 97:41 1.12% authd 1613 root 1 103 0 74532K 67784K RUN 3:40 1.07% shm-rtsdbd 2361 root 1 103 0 177M 115M RUN 5:57 1.03% dfwd 12 root 1 -40 -159 0K 16K WAIT 6:26 0.93% swi2: netisr 0
The underlying issue is that the Kernel and jl2tpd cannot keep up with the amount of ICRQ messages (incoming call requests) seeking to establish the L2TP sessions (PPP). The MX does not drop the ICRQ, but instead takes a long time to respond to the ICRQ with an ICRP (incoming call reply). The client thus times out and retries by the time the ICRP is sent. This behavior prevents sessions from logging back in.
To help keep the Kernel and jl2tpd from being overwhelmed, turn on DDOS protection for L2TP.
By default, DDOS for L2TP is an aggregate of 20 kpps. This is too high. Changing the aggregated rate from 20 kpps to 250 pps will keep the Kernel/jl2tpd from being overwhelmed.
Setting the DDOS to 250 pps could have a side effect, however: DDOS for L2TP does not break out packet types and is only for the aggregate. Because of this, L2TP hello packets could be dropped for existing tunnels during these high rates of call setup, where L2TP is in violation and is dropping L2TP packets. Therefore, it is recommended that the tunnel-group hello interface be changed or set from the default of 60 seconds to 0 seconds. This change prevents the system from sending hellos, and allows the system to respond only to hellos from the LACs.
Example: DDOS setting for L2TP
lab@mx_lab-re1# show system ddos-protection protocols l2tp aggregate { bandwidth 250; burst 250; recover-time 90; }
L2TP tunnel hello-interface set to 0
tunnel-group tg1-N00009-1-1-1 { l2tp-access-profile bell-l2tp-access-profile; aaa-access-profile aaa-profile; hello-interval 0; local-gateway { address 67.69.121.170; } service-device-pool slot1-tunnel-pool; dynamic-profile dynamic-profile-1; lab@mx_lab-re1# run show ddos-protection protocols l2tp aggregate Currently tracked flows: 0, Total detected flows: 0 * = User configured value Protocol Group: L2TP Packet type: aggregate (Aggregate for all L2TP traffic) Aggregate policer configuration: Bandwidth: 250 pps* Burst: 250 packets* Recover time: 90 seconds* Enabled: Yes Flow detection configuration: Detection mode: Automatic Detect time: 3 seconds Log flows: Yes Recover time: 60 seconds Timeout flows: No Timeout time: 300 seconds Flow aggregation level configuration: Aggregation level Detection mode Control mode Flow rate Subscriber Automatic Drop 10 pps Logical interface Automatic Drop 10 pps Physical interface Automatic Drop 20000 pps System-wide information: Aggregate bandwidth is being violated! No. of FPCs currently receiving excess traffic: 1 No. of FPCs that have received excess traffic: 1 Violation first detected at: 2014-07-08 09:14:41 PDT Violation last seen at: 2014-07-08 09:18:12 PDT Duration of violation: 00:03:31 Number of violations: 2 Received: 6973954 Arrival rate: 617 pps Dropped: 91100 Max arrival rate: 1729 pps Routing Engine information: Bandwidth: 250 pps, Burst: 250 packets, enabled Aggregate policer is no longer being violated Last violation started at: 2014-07-08 07:08:52 PDT Last violation ended at: 2014-07-08 07:08:55 PDT Duration of last violation: 00:00:03 Number of violations: 1 Received: 6883414 Arrival rate: 250 pps Dropped: 197 Max arrival rate: 2811 pps Dropped by individual policers: 0 Dropped by aggregate policer: 197 FPC slot 1 information: Bandwidth: 100% (250 pps), Burst: 100% (250 packets), enabled Aggregate policer is never violated Received: 1153414 Arrival rate: 0 pps Dropped: 0 Max arrival rate: 1133 pps Dropped by individual policers: 0 Dropped by flow suppression: 0 FPC slot 2 information: Bandwidth: 100% (250 pps), Burst: 100% (250 packets), enabled Aggregate policer is currently being violated! Violation first detected at: 2014-07-08 09:14:41 PDT Violation last seen at: 2014-07-08 09:18:12 PDT Duration of violation: 00:03:31 Number of violations: 1 Received: 5820540 Arrival rate: 617 pps Dropped: 90903 Max arrival rate: 1729 pps Dropped by individual policers: 0 Dropped by aggregate policer: 90903 Dropped by flow suppression: 0 Flow counts: Aggregation level Current Total detected State Subscriber 0 0 Active Logical-interface 0 0 Active Physical-interface 0 0 Active FPC slot 3 information: Bandwidth: 100% (250 pps), Burst: 100% (250 packets), enabled Aggregate policer is never violated Received: 0 Arrival rate: 0 pps Dropped: 0 Max arrival rate: 0 pps Dropped by individual policers: 0 Dropped by flow suppression: 0 MASTER[edit] lab@mx_lab-re1#
With DDOS settings in place, subscribers L2TP will go into violation state, which is good. This will allow the Kernel/jl2tp to keep up and subscribers can log in successfully.
MASTER[edit] lab@mx_lab-re1# run show subscribers summary Subscribers by State Init: 3154 Configured: 39 Active: 11116 Total: 14309 Subscribers by Client Type L2TP: 14309 Total: 14309 MASTER[edit] lab@mx_lab-re1# run show subscribers summary Subscribers by State Init: 3071 Configured: 110 Active: 11146 Total: 14327 Subscribers by Client Type L2TP: 14327 Total: 14327 MASTER[edit] lab@mx_lab-re1# run show subscribers summary Subscribers by State Init: 3032 Configured: 118 Active: 11186 Total: 14336 Subscribers by Client Type L2TP: 14336 Total: 14336 MASTER[edit]
Related Links
Getting Up and Running with Junos
Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search