[SRX] TCP for RTLOG does not work over an Active-Active HA deployment

  [KB32393] Show Article Properties


Summary:

This article describes how RTLOG data sent to a server over TCP transport does not work in an Active-Active chassis cluster deployment.

Symptoms:

Symptoms:

  • SRX is sending TCP resets to the log server.
  • RTLOG_CONN_ERROR​ are reported in the messages logs.
  • Log server is not receiving some RTLOG data.
 

Topology Example:

NetworkA --- 2.2.2.1 (reth1) SRX-Node0 (reth0) 1.1.1.1 --- Syslog server (1.1.1.2)
                             |      |
                          Control  Fabric
                             |      |
NetworkB --- 3.3.3.1 (reth2) SRX-Node1 (reth3) 4.4.4.1 ---- NetworkC
 

Test steps:

  1. Generate ICMP/TCP traffic from NetworkA to 1.1.1.2.
    This hits a security policy that has logging enabled.
    This is a-non fabric traversal scenario.
  2. Generate ICMP/TCP traffic from NetworkB to NetworkC.
    This hits another security policy that has logging enabled.
    Traffic logging needs to traverse the Fabric link to reach the Syslog server (forward session for logging).
  3. Show the flow session for each node:
    {primary:node0}
    root@SRX-Node0> show security flow session
    node0:

    Flow Sessions on FPC1 PIC0:

    Session ID: 20064981, Policy name: self-traffic-policy/1, State: Active, Timeout: 78554, Valid
      In: 1.1.1.1/11090 --> 1.1.1.2/6514;tcp, If: .local..0, Pkts: 2, Bytes: 84
      Out: 1.1.1.2/6514 --> 1.1.1.1/11090;tcp, If: reth0.0, Pkts: 23613, Bytes: 944524
    
    Session ID: 20077458, Policy name: self-traffic-policy/1, State: Active, Timeout: 1786, Valid
      In: 1.1.1.1/51654 --> 1.1.1.2/22;tcp, If: .local..0, Pkts: 10, Bytes: 2633
      Out: 1.1.1.2/22 --> 1.1.1.1/51654;tcp, If: reth0.0, Pkts: 10, Bytes: 1977
    Total sessions: 2
    
    node1:

    Flow Sessions on FPC1 PIC0:

    Session ID: 20059688, Policy name: self-traffic-policy/1, State: Backup, Timeout: 117610, Valid
      In: 1.1.1.1/11090 --> 1.1.1.2/6514;tcp, If: .local..0, Pkts: 0, Bytes: 0
      Out: 1.1.1.2/6514 --> 1.1.1.1/11090;tcp, If: reth0.0, Pkts: 0, Bytes: 0
    
    Session ID: 20071553, Policy name: self-traffic-policy/1, State: Backup, Timeout: 14384, Valid
      In: 1.1.1.1/51654 --> 1.1.1.2/22;tcp, If: .local..0, Pkts: 0, Bytes: 0
      Out: 1.1.1.2/22 --> 1.1.1.1/51654;tcp, If: reth0.0, Pkts: 0, Bytes: 0
    
    Session ID: 20071560, Policy name: self-traffic-policy/1, State: Forward, Timeout: 4, Valid <<<<Forward
      In: 1.1.1.1/11505 --> 1.1.1.2/6514;tcp, If: .local..0, Pkts: 1, Bytes: 44
      Out: 1.1.1.2/6514 --> 1.1.1.1/11505;tcp, If: reth0.0, Pkts: 0, Bytes: 0
    
    Session ID: 20071562, Policy name: self-traffic-policy/1, State: Forward, Timeout: 8, Valid
      In: 1.1.1.1/11505 --> 1.1.1.2/6514;tcp, If: .local..0, Pkts: 1, Bytes: 44
      Out: 1.1.1.2/6514 --> 1.1.1.1/11505;tcp, If: reth0.0, Pkts: 0, Bytes: 0
    
    Session ID: 20071563, Policy name: self-traffic-policy/1, State: Backup, Timeout: 78718, Valid
      In: 1.1.1.1/11505 --> 1.1.1.2/6514;tcp, If: .local..0, Pkts: 0, Bytes: 0
      Out: 1.1.1.2/6514 --> 1.1.1.1/11505;tcp, If: reth0.0, Pkts: 0, Bytes: 0
    
    
  4. Review the logs

    TCP resets reported in the flow traceoptions when testing the Active-Active scenario

    Line 3617: Dec 18 05:25:03 05:25:02.946134:CID-01:FPC-01:PIC-00:THREAD_ID-22:RT:flow_process_tcp_rst_and_icmp_error: Processing TCP RST from self/host
    Line 3630: Dec 18 05:25:03 05:25:02.946262:CID-01:FPC-01:PIC-00:THREAD_ID-22:RT:flow_process_tcp_rst_and_icmp_error: TCP RST changing in_ifp to .local..0
    Line 11680: Dec 18 05:25:09 05:25:09.711282:CID-01:FPC-01:PIC-00:THREAD_ID-21:RT:flow_process_tcp_rst_and_icmp_error: Processing TCP RST from self/host
    Line 11693: Dec 18 05:25:09 05:25:09.711415:CID-01:FPC-01:PIC-00:THREAD_ID-21:RT:flow_process_tcp_rst_and_icmp_error: TCP RST changing in_ifp to .local..0
    Line 17496: Dec 18 05:25:12 05:25:12.643366:CID-01:FPC-01:PIC-00:THREAD_ID-23:RT:flow_process_tcp_rst_and_icmp_error: Processing TCP RST from self/host
    Line 17509: Dec 18 05:25:12 05:25:12.643488:CID-01:FPC-01:PIC-00:THREAD_ID-23:RT:flow_process_tcp_rst_and_icmp_error: TCP RST changing in_ifp to .local..0
    Line 23127: Dec 18 05:25:15 05:25:15.641386:CID-01:FPC-01:PIC-00:THREAD_ID-26:RT:flow_process_tcp_rst_and_icmp_error: Processing TCP RST from self/host
    Line 23140: Dec 18 05:25:15 05:25:15.641509:CID-01:FPC-01:PIC-00:THREAD_ID-26:RT:flow_process_tcp_rst_and_icmp_error: TCP RST changing in_ifp to .local..0

    Simultaneous connection errors reported in the Active-Active scenario

    {primary:node0}
    root@SRX-Node0> show log messages | match RTLOG_CONN_ERROR | last 10
    Dec 18 13:22:23  SRX-Node0 RT_SYSTEM: RTLOG_CONN_ERROR: Connection error tcp_1.1.1.2 Com 11079 abort
    Dec 18 14:31:43  SRX-Node0 RT_SYSTEM: RTLOG_CONN_ERROR: Connection error tcp_1.1.1.2 Com 11080 abort
    Dec 18 14:31:47  SRX-Node0 RT_SYSTEM: RTLOG_CONN_ERROR: Connection error tcp_1.1.1.2 Com 11081 abort
    Dec 18 14:31:51  SRX-Node0 RT_SYSTEM: RTLOG_CONN_ERROR: Connection error tcp_1.1.1.2 Com 11082 abort
    Dec 18 14:34:58  SRX-Node0 RT_SYSTEM: RTLOG_CONN_ERROR: Connection error tcp_1.1.1.2 Com 11083 abort
Cause:

Currently TCP logging is not supported for Active-Active Cluster setups. When attempting to use Active-Active Cluster setup, the log client residing on the active node is unaware of the log client on the passive node that has a forwarding log session. This results in incorrect processing of the TCP log traffic and results in RST packets being generated.

Solution:

Move all Redundancy Groups (RGs) to one node for Active-Passive deployment setup.

Modification History:
2019-06-23: Article reviewed for accuracy.
Related Links: