Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[Contrail] Corrupt Security Group due to missing egress-access-control-list entry

0

0

Article ID: KB35546 KB Last Updated: 11 Mar 2020Version: 1.0
Summary:

There is a lack of roll-back mechanism in the current Contrail releases when a security group (SG) creation or deletion process returns exceptions due to Contrail API or schema transformer timeout, which may generate stale entries in Zookeeper as well as the config database. Later creation of a security group using the same name will fail because this stale entry exists in Zookeeper. This article describes a failure example and software enhancement in future Contrail releases.  

Symptoms:

ICMPv6 traffic does not run between two virtual machines even after applying 'allow-all' security group policy. When checking the network flow created by ICMPv6, traffic drop is due to the SG list below.

$ flow --match "2001:1890:fc:1803::1,2001:1890:fc:1803::3:1"

Listing flows matching ([2001:1890:fc:1803::1]:*, [2001:1890:fc:1803::3:1]:*) 
 
    Index                Source:Port/Destination:Port                      Proto(V) 
----------------------------------------------------------------------------------- 
   922896<=>1971768      2001:1890:fc:1803::3:1:26242                       58 (3) 
                         2001:1890:fc:1803::1:129 
(Gen: 1, K(nh):49, Action:D(SG), Flags:, QOS:-1, S(nh):49,  Stats:17/2006, 
 SPort 59772, TTL 0, Sinfo 6.0.0.0) 
 
  1971768<=>922896       2001:1890:fc:1803::1:26242                         58 (3) 
                         2001:1890:fc:1803::3:1:129 
(Gen: 1, K(nh):49, Action:H, Flags:, QOS:-1, S(nh):8,  Stats:0/0,  SPort 53079, 
 TTL 0, Sinfo 0.0.0.0)

When checking SG configuration via 8095 port, the non-working SG was missing the egress-access-control-list configuration, while a normal SG has both the ingress-access-control-list as well as the egress-access-control-list. User tries to remove and re-add the egress rule via webUI, the egress-access-control-list was still missing in the broken SG's configuration.

Broken SG:

# curl -u xxxxx:xxxxx http://localhost:8095/security-group/0bbcf642-1052-4d2b-84a6-77afc2276926 | python -mjson.tool
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 19466 100 19466 0 0 1791k 0 --:--:-- --:--:-- --:--:-- 1900k
{
"security-group": {
"access_control_lists": [
{
"href": "http://localhost:8095/access-control-list/617efa05-a145-4ce5-9981-966d7566d229",
"to": [
"default-domain",
"SHAKEN-27300-P-01-bot2b",
"bot2bssas0001v_vrf_sg",
"ingress-access-control-list" <-- only ingress access control list exists
],
"uuid": "617efa05-a145-4ce5-9981-966d7566d229"
}
],

A normal SG:

# curl -u xxxx:xxxxx http://localhost:8095/security-group/52668c39-7d65-4db3-b9b6-9f9a6afe9982 | python -mjson.tool
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 6843 100 6843 0 0 751k 0 --:--:-- --:--:-- --:--:-- 835k
{
"security-group": {
"access_control_lists": [
{
"href": "http://localhost:8095/access-control-list/7fb9f858-10f8-4b64-bad9-d64c1e5feba0",
"to": [
"default-domain",
"SHAKEN-27300-P-01-bot2b",
"test",
"ingress-access-control-list"
],
"uuid": "7fb9f858-10f8-4b64-bad9-d64c1e5feba0"
},
{
"href": "http://localhost:8095/access-control-list/d74020db-9484-4048-8239-433a97976d5d", <-- both ingress and egress exist
"to": [
"default-domain",
"SHAKEN-27300-P-01-bot2b",
"test",
"egress-access-control-list"
],
"uuid": "d74020db-9484-4048-8239-433a97976d5d"
}
],
Cause:

When searching ​the API log, there is only ingress ACL created but no egress ACL created messages.

172.29.16.84 - - [2020-01-28 18:26:08] "GET /project/2b18eab3-f1b1-4cf0-8cea-d372036e8782?exclude_back_refs=True&exclude_children=True HTTP/1.1" 200 1035 0.108434
DEBUG:api-0:Sending request(xid=2445): Create(path='/fq-name-to-uuid/access_control_list:default-domain:SHAKEN-27300-P-01-bot2b:bot2bssas0001v_vrf_sg:ingress-access-control-list', data='617efa05-a145-4ce5-9981-966d7566d229', acl=[ACL(perms=31, acl_list=['ALL'], id=Id(scheme='world', id='anyone'))], flags=0)
DEBUG:api-0:Received response(xid=2445): u'/fq-name-to-uuid/access_control_list:default-domain:SHAKEN-27300-P-01-bot2b:bot2bssas0001v_vrf_sg:ingress-access-control-list'

 

The schema log shows error, which indicates the creation of egress ACL failed because there is already an egress ACL with the same UUID in config DB!

639 'egress-access-control-list'] already exists with uuid: 0f35260c-aab9-4b07-b232-56ba717c2fbd

The schema log also indicates previous deletion of this egress ACL failed due to schema transformer timeout, which left this orphan egress ACL in config DB

<class 'cfgm_common.exceptions.TimeOutError'>
Python 2.7.6: /usr/bin/python
Tue Dec 17 20:53:03 2019

A problem occurred in a Python script.  Here is the sequence of function calls leading up to the error, in the order they occurred:

 /usr/lib/python2.7/dist-packages/cfgm_common/vnc_amqp.py in _vnc_subscribe_callback(self=<schema_transformer.st_amqp.STAmqpHandle object>, oper_info={u'imid': u'contrail:security-group:default-domain:SHAKEN-27300-P-01-bot2b:bot2bssas0001v_vrf_sg', u'obj_dict': {u'access_control_lists': [{u'to': [u'default-domain', u'SHAKEN-27300-P-01-bot2b', u'bot2bssas0001v_vrf_sg', u'ingress-access-control-list'], u'uuid': u'13ef238b-bb17-407c-bb44-ac83c7a32dc3'}, {u'to': [u'default-domain', u'SHAKEN-27300-P-01-bot2b', u'bot2bssas0001v_vrf_sg', u'egress-access-control-list'], u'uuid': u'0f35260c-aab9-4b07-b232-56ba717c2fbd'}], u'display_name': u'bot2bssas0001v_vrf_sg', u'fq_name': [u'default-domain', u'SHAKEN-27300-P-01-bot2b', u'bot2bssas0001v_vrf_sg'], u'id_perms': {u'created': u'2019-12-04T17:54:10.302929', u'creator': None, u'description': u'', u'enable': True, u'last_modified': u'2019-12-17T20:50:02.565025', u'permissions': {u'group': u'admin', u'group_access': 7, u'other_access': 7, u'owner': u'm95682', u'owner_access': 7}, u'user_visible': True, u'uuid': {u'uuid_lslong': 12488781031113658163L, u'uuid_mslong': 17391345103892335617L}}, u'parent_type': u'project', u'parent_uuid': u'7f9d2048-27d0-496b-9934-fbaa08c59355', u'perms2': {u'global_access': 0, u'owner': u'7f9d204827d0496b9934fbaa08c59355', u'owner_access': 7, u'share': []}, u'security_group_entries': {u'policy_rule': []}, u'security_group_id': 8000027, u'uuid': u'f15a783a-8bbb-4c01-ad51-102ded2a8f33'}, u'oper': u'DELETE', u'parent_imid': u'contrail:project:default-domain:SHAKEN-27300-P-01-bot2b', u'type': u'security_group', u'uuid': u'f15a783a-8bbb-4c01-ad51-102ded2a8f33'})​
Solution:

In this failure scenario where either ingress or egress ACL is missing, there are two options:

  1. Create a SG with a different name from the corrupted SG name. 

  2. Manually delete the stale entry via API or Contrail config editor first. Then delete the corrupt SG, and finally create a new SG with the same name.

Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search