-
Bug
-
Resolution: Not a Bug
-
Undefined
-
None
-
None
-
None
-
False
-
False
-
rhel-9
-
None
-
rhel-net-ovn
-
-
-
ssg_networking
Problem Description: Clearly explain the issue.
https://ovn-kubernetes.io/okeps/okep-5224-connecting-udns/okep-5224-connecting-udns/#services
is being implemented here: https://github.com/ovn-kubernetes/ovn-kubernetes/pull/5828
As part of this feature, we are adding hierarchical ACLs,
3 ACLs:
- one at tier0 that let's destination serviceCIDR be `pass` at priority 500
- one at tier0 that lets destination being same network be a `pass` at priority 475
- another is a drop ACL at 450 if its to network cidrs in the address-set as part of the match also in tier0
example ACLs:
_uuid : 2ba0a82a-399d-4caf-af5e-0a3ab359361d
action : drop
direction : from-lport
external_ids : {"k8s.ovn.org/id"="ovnkube-network-connect-controller:ClusterNetworkConnect:test-cnc-mg5kq:drop-pod", "k8s.ovn.org/name"=test-cnc-mg5kq, "k8s.ovn.org/owner-controller"=ovnkube-network-connect-controller, "k8s.ovn.org/owner-type"=ClusterNetworkConnect, type=drop-pod}
label : 0
log : false
match : "ip4.dst == $a10198858957931958511 || ip6.dst == $a10198861156955214933"
meter : acl-logging
name : []
options : {}
priority : 450
sample_est : []
sample_new : []
severity : []
tier : 0
===
_uuid : aafe9da6-cc2e-4179-9269-3ec11382960d
action : pass
direction : from-lport
external_ids : {"k8s.ovn.org/id"="ovnkube-network-connect-controller:ClusterNetworkConnect:test-cnc-mg5kq:allow-same-network-15", "k8s.ovn.org/name"=test-cnc-mg5kq, "k8s.ovn.org/owner-controller"=ovnkube-network-connect-controller, "k8s.ovn.org/owner-type"=ClusterNetworkConnect, type=allow-same-network-15}
label : 0
log : false
match : "ip4.dst == 10.128.0.0/16 || ip6.dst == 2014:100:200::/60"
meter : acl-logging
name : []
options : {}
priority : 475
sample_est : []
sample_new : []
severity : []
tier : 0
===
_uuid : 81d6582f-b56d-4e4a-8410-cbf7e7bf7fdf
action : pass
direction : from-lport
external_ids : {"k8s.ovn.org/id"="ovnkube-network-connect-controller:ClusterNetworkConnect:test-cnc-mg5kq:allow-service", "k8s.ovn.org/name"=test-cnc-mg5kq, "k8s.ovn.org/owner-controller"=ovnkube-network-connect-controller, "k8s.ovn.org/owner-type"=ClusterNetworkConnect, type=allow-service}
label : 0
log : false
match : "(ip4.dst == 10.96.0.0/16 || ip6.dst == fd00:10:96::/112)"
meter : acl-logging
name : []
options : {}
priority : 500
sample_est : []
sample_new : []
severity : []
tier : 0
==
the expectation is when we do a curl to a service it should work - but it doesn't work today. When we add `ct.new` state to the ACL match for the drop ACL things work as expected but ct.new shouldn't be required at all.
The service VIPs are being dropped by the drop ACL because without ct.new, the drop matches ALL traffic to the connected subnets, including DNAT'd return traffic from service VIPs.The flow is:
- Pod sends to service VIP (e.g., 10.96.x.x) → allow-service ACL passes it (priority 500)
- OVN LB DNATs the VIP to the backend pod IP (e.g., 10.128.x.x)
- The DNAT'd packet now has dst == 10.128.x.x (a pod subnet) but ACL won't be applied here since apply-after-lb is set to false
- reply should come the same path
Impact Assessment: Describe the severity and impact (e.g., network down,availability of a workaround, etc.).
workaround is :
_uuid : 2ba0a82a-399d-4caf-af5e-0a3ab359361d
action : drop
direction : from-lport
external_ids : {"k8s.ovn.org/id"="ovnkube-network-connect-controller:ClusterNetworkConnect:test-cnc-mg5kq:drop-pod", "k8s.ovn.org/name"=test-cnc-mg5kq, "k8s.ovn.org/owner-controller"=ovnkube-network-connect-controller, "k8s.ovn.org/owner-type"=ClusterNetworkConnect, type=drop-pod}
label : 0
log : false
match : "ip4.dst == $a10198858957931958511 || ip6.dst == $a10198861156955214933" && ct.new
meter : acl-logging
name : []
options : {}
priority : 450
sample_est : []
sample_new : []
severity : []
tier : 0
$ oc rsh -n black-ns0-4xzlx black-pod-0
/ $ curl http://10.96.88.94:8080/hostname --max-time 5
curl: (28) Operation timed out after 5002 milliseconds with 0 bytes received
/ $ curl http://10.96.88.94:8080/hostname --max-time 5
curl: (28) Operation timed out after 5001 milliseconds with 0 bytes received
Software Versions: Specify the exact versions in use (e.g.,openvswitch3.1-3.1.0-147.el8fdp).
25.09
Issue Type: Indicate whether this is a new issue or a regression (if a regression, state the last known working version).
Reproducibility: Confirm if the issue can be reproduced consistently. If not, describe how often it occurs.
Reproduction Steps: Provide detailed steps or scripts to replicate the issue.
Expected Behavior: Describe what should happen under normal circumstances.
Observed Behavior: Explain what actually happens.
Troubleshooting Actions: Outline the steps taken to diagnose or resolve the issue so far.