Support Support Downloads Knowledge Base Juniper Support Portal Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[Contrail] RHOSP16+CEM2011 BGP peering to gateway lost upon controller reboot

0

0

Article ID: KB37390 KB Last Updated: 01 Oct 2021Version: 1.0
Summary:

Contrail users may find that the Border Gateway Protocol (BGP) peering with the Data Center gateway is lost following a reboot of the controller in an RHOSP16+CEM2011.L1 environment, which results in the BGP sessions going down. The logs indicate a "peer as mismatch" message.

This article explains how to troubleshoot and work around the issue.

Symptoms:

It is seen that the BGP peering on the Data Center gateway to the Contrail controller is lost following a controller reboot. The following logs with a "peer as mismatch" message are observed on the DCGW node and the BGP sessions are seen to be down:

root@dcgw> show bgp summary |match 10.10.10  
10.10.10.7           64999          0          0       0      14       23:53 Active   <<<
10.10.10.8           64999      12054      14131       0       9 3d 16:48:18 Establ
10.10.10.16          64999      12356      14143       0       9 3d 16:54:37 Establ
root@dcgw> show log messages | match 10.10.10.7
Jun 10 11:45:31.596  dcgw rpd[13815]: bgp_pp_recv:4663: NOTIFICATION sent to 10.10.10.7+46021 (proto): code 2 (Open Message Error) subcode 2 (bad peer AS number), Reason: no group for 10.10.10.7+46021 (proto) from AS 65002 found (peer as mismatch) in master(irb.10), dropping him
Jun 10 11:46:03.604  dcgw rpd[13815]: bgp_pp_recv:4663: NOTIFICATION sent to 10.10.10.7+52428 (proto): code 2 (Open Message Error) subcode 2 (bad peer AS number), Reason: no group for 10.10.10.7+52428 (proto) from AS 65002 found (peer as mismatch) in master(irb.10), dropping him
Jun 10 11:46:35.612  dcgw rpd[13815]: bgp_pp_recv:4663: NOTIFICATION sent to 10.10.10.7+37316 (proto): code 2 (Open Message Error) subcode 2 (bad peer AS number), Reason: no group for 10.10.10.7+37316 (proto) from AS 65002 found (peer as mismatch) in master(irb.10), dropping him
Jun 10 11:46:41.476  dcgw rpd[13815]: bgp_process_open:4112: NOTIFICATION sent to 10.10.10.7 (Internal AS 64999): code 2 (Open Message Error) subcode 2 (bad peer AS number), Reason: peer 10.10.10.7 (Internal AS 64999) claims 65002 to be the value whereas 64999 is configured
Cause:

When a Contrail controller is rebooted or the contrail_control_provisioner container is restarted, in both the cases, the value of "local_autonomous system" is seen to change to NULL in the local controller configuration. This causes the BGP peering with the Data Center gateway router to be lost. 

Following this, the BGP packets that are sent to DCGW are sent with the AS number 65002, when the correct AS number as indicated above is 64999. Due to this, the BGP logs on DCGW show a "peer as mismatch" message.

The issue can be verified from the contrail-api logs as shown below:

[root@contrail-controller-2 config-api]# tailf contrail-api-0.log|grep -i bgp

06/22/2021 12:13:57.673 7fee924d3cd0 [contrail-api] [INFO]: default [SYS_INFO]: VncApiConfigLog: api_log = <<  identifier_uuid = 62680f5b-33a0-4086-86fb-359984f1ce23  object_type  = bgp_router  identifier_name = default-domain:default-project:ip-fabric:default:contrail-controller-0.internalapi.localdomain  url = http://10.10.10.8:8082/bgp-router/62680f5b-33a0-4086-86fb-359984f1ce23  operation = http_put  useragent = Restler  for node.js  remote_ip = 10.10.10.254  body = {"bgp-router": {"fq_name": ["default-domain", "default-project", "ip-fabric", "default", "contrail-controller-0.internalapi.localdomain"], "uuid": "62680f5b-33a0-4086-86fb-359984f1ce23", "parent_type": "routing-instance", "bgp_router_parameters": {"vendor": "contrail", "admin_down": false, "local_autonomous_system": null, "autonomous_system": 65002, "auth_data": null, "address": "10.10.10.7", "router_type": "control-node", "identifier": "10.10.10.7", "hold_time": 90, "port":  179, "address_families": {"family": ["route-target", "inet-vpn", "e-vpn", "erm-vpn", "inet6-vpn"]}}, "tag_refs": [], "bgp_router_refs": [{"to": ["default-domain", "default-project", "ip-fabric", "default", "contrail-controller-2.internalapi.localdomain"],  
<clipped>
Solution:

To work around this problem, manually add the AS number that is used for peering with the Data Center gateway router in the contrail_control_provisioner container via the provision_control.py script (the script is as follows):

[root@contrail-controller-2 heat-admin]# podman exec -it contrail_control_provisioner bash
(control-provisioner)[root@contrail-controller-2 /]$ vi /opt/contrail/utils/provision_control.py

<snipped>
    conf_parser = argparse.ArgumentParser(add_help=False)

    conf_parser.add_argument("-c", "--conf_file",
                             help="Specify config file", metavar="FILE")
    args, remaining_argv = conf_parser.parse_known_args(args_str.split())

    defaults = {
        'router_asn': '64512',
        'enable_4byte_as': None,
        'bgp_server_port': 179,
        'local_autonomous_system': "None",
        'ibgp_auto_mesh': None,
        'api_server_ip': '127.0.0.1',
        'api_server_port': '8082',
        'api_server_use_ssl': False,
        'oper': None,
        'admin_user': None,
        'admin_password': None,
        'admin_tenant_name': None,
        'md5': None,
        'graceful_restart_time': 300,
        'long_lived_graceful_restart_time': 300,
        'end_of_rib_timeout': 300,
        'graceful_restart_bgp_helper_enable': False,
        'graceful_restart_xmpp_helper_enable': False,
        'graceful_restart_enable': False,
        'set_graceful_restart_parameters': False,
        'sub_cluster_name': None,
        'peer_list': None,

Modify the value of 'local_autonomous_system': "None" to 'local_autonomous_system': "64999" (AS number of the DCGW router) in the provision_control.py file. After the changes are made, the provision_control.py defaults should appear as follows:

    conf_parser = argparse.ArgumentParser(add_help=False)
    conf_parser.add_argument("-c", "--conf_file",
                             help="Specify config file", metavar="FILE")
    args, remaining_argv = conf_parser.parse_known_args(args_str.split())

    defaults = {
        'router_asn': '64512',
        'enable_4byte_as': None,
        'bgp_server_port': 179,
        'local_autonomous_system': "64999",
        'ibgp_auto_mesh': None,
        'api_server_ip': '127.0.0.1',
        'api_server_port': '8082',
        'api_server_use_ssl': False,
        'oper': None,
        'admin_user': None,
        'admin_password': None,
        'admin_tenant_name': None,
        'md5': None,
        'graceful_restart_time': 300,
        'long_lived_graceful_restart_time': 300,
        'end_of_rib_timeout': 300,
        'graceful_restart_bgp_helper_enable': False,
        'graceful_restart_xmpp_helper_enable': False,
        'graceful_restart_enable': False,
        'set_graceful_restart_parameters': False,
        'sub_cluster_name': None,
        'peer_list': None,

After saving the file, restart contrail_control_provisioner and verify that the BGP peering has been restored. Ensure that the local_autonomous_system value is correct as updated in the Python file.

[root@contrail-controller-2 config-api]# tailf contrail-api-0.log|grep -i bgp
06/22/2021 12:17:23.797 7fee91a73eb0 [contrail-api] [INFO]: __default__ [SYS_INFO]: VncApiConfigLog: api_log = <<  identifier_uuid = 62680f5b-33a0-4086-86fb-359984f1ce23  object_type = bgp_router  identifier_name = default-domain:default-project:ip-fabric:__default__:dcl01-contrail-controller-0.internalapi.localdomain  url = http://192.168.4.8:8082/bgp-router/62680f5b-33a0-4086-86fb-359984f1ce23  operation = http_put  useragent = Restler for node.js  remote_ip = 10.247.136.137  body = {"bgp-router": {"fq_name": ["default-domain", "default-project", "ip-fabric", "__default__", "dcl01-contrail-controller-0.internalapi.localdomain"], "uuid": "62680f5b-33a0-4086-86fb-359984f1ce23", "parent_type": "routing-instance", "bgp_router_parameters": {"vendor": "contrail", "admin_down": false, "local_autonomous_system": 64999, "autonomous_system": 65002, "auth_data": null, "address": "192.168.4.16", "router_type": "control-node", "identifier": "192.168.4.16", "hold_time": 90, "port": 179, "address_families": {"family": ["route-target", "inet-vpn", "e-vpn", "erm-vpn", "inet6-vpn"]}}, "tag_refs": [], "bgp_router_refs": [{"to": ["default-domain", "default-project", "ip-fabric", "__default__", "contrail-controller-2.internalapi.localdomain"],

Note: "local_autonomous_system" is the AS number for the peer gateway router (DCGW).

root@dgcw> show bgp summary |match 10.10.10   
10.10.10.7           64999         13         12       0      16        3:01 Establ
10.10.10.8           64999      12096      14177       0       9 3d 17:08:30 Establ
10.10.10.16          64999      12398      14189       0       9 3d 17:14:49 Establ

Related Links

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search