-
Bug
-
Resolution: Won't Do
-
Major
-
None
-
4.12.z
-
Moderate
-
No
-
SDN Sprint 246, SDN Sprint 247, SDN Sprint 248, SDN Sprint 254, SDN Sprint 255, SDN Sprint 256, SDN Sprint 257, SDN Sprint 258
-
8
-
False
-
-
Release Note Not Required
-
In Progress
-
-
-
08/27 not an issue in 4.14. Upstream PR is failing some tests. Two bugs tied to this PR for 4.12, 4.13. Reaching out to Peri and Zshi
-
-
Description of problem:
A customer that was hit by a previous bug with stale SNATs and duplicated SNATs for egressIPs, has upgraded to 4.12.40 where many fixes have been released to address these issues, but they can still see on some clusters such issues happening in a very alarming high rate.
Looking at one example we have this egressIP:
apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
creationTimestamp: "2023-06-20T14:10:09Z"
generation: 80
name: intm-pcpos-mbb.egressip
resourceVersion: "836331030"
uid: 8d7d3090-a28d-47e1-aa89-673f4d30c944
spec:
egressIPs:
- 172.18.159.22
namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: intm-pcpos-mbb
podSelector: {}
status:
items:
- egressIP: 172.18.159.22
node: iepumnosw602.epu.corpintra.net
When we look at the NBDB there is no single SNAT created for logical port on the GR_iepumnosw602.epu.corpintra.net:
$ sudo ovn-nbctl find nat external_ip=172.18.159.22
_uuid : 16e03a1e-b303-4a34-ab32-63bd7335a92b
allowed_ext_ips : []
exempted_ext_ips : []
external_ids : {name=intm-pcpos-mbb.egressip}
external_ip : "172.18.159.22"
external_mac : []
external_port_range : ""
gateway_port : []
logical_ip : "10.241.5.12"
logical_port : k8s-iepumnosw601.epu.corpintra.net
options : {stateless="false"}
type : snat
_uuid : e94bc46e-bf6f-4c0f-9cad-40912dd21681
allowed_ext_ips : []
exempted_ext_ips : []
external_ids : {name=intm-pcpos-mbb.egressip}
external_ip : "172.18.159.22"
external_mac : []
external_port_range : ""
gateway_port : []
logical_ip : "10.243.0.40"
logical_port : k8s-iepumnosw601.epu.corpintra.net
options : {stateless="false"}
type : snat
_uuid : 045530e3-92da-49b1-8e03-eca31504515f
allowed_ext_ips : []
exempted_ext_ips : []
external_ids : {name=intm-pcpos-mbb.egressip}
external_ip : "172.18.159.22"
external_mac : []
external_port_range : ""
gateway_port : []
logical_ip : "10.241.2.134"
logical_port : k8s-iepumnosw601.epu.corpintra.net
options : {stateless="false"}
type : snat
_uuid : af886107-3fae-4671-9fe3-abc11b9cae78
allowed_ext_ips : []
exempted_ext_ips : []
external_ids : {name=intm-pcpos-mbb.egressip}
external_ip : "172.18.159.22"
external_mac : []
external_port_range : ""
gateway_port : []
logical_ip : "10.241.2.137"
logical_port : k8s-iepumnosw601.epu.corpintra.net
options : {stateless="false"}
type : snat
_uuid : 377d695a-da42-4274-b0ab-2ae72d5e75ba
allowed_ext_ips : []
exempted_ext_ips : []
external_ids : {name=intm-pcpos-mbb.egressip}
external_ip : "172.18.159.22"
external_mac : []
external_port_range : ""
gateway_port : []
logical_ip : "10.243.0.41"
logical_port : k8s-iepumnosw601.epu.corpintra.net
options : {stateless="false"}
type : snat
_uuid : 4d8ae9c0-e442-4747-82da-f88c03c87a4e
allowed_ext_ips : []
exempted_ext_ips : []
external_ids : {name=intm-pcpos-mbb.egressip}
external_ip : "172.18.159.22"
external_mac : []
external_port_range : ""
gateway_port : []
logical_ip : "10.240.4.55"
logical_port : k8s-iepumnosw601.epu.corpintra.net
options : {stateless="false"}
type : snat
_uuid : 5e919dfe-4d07-4cc9-88a3-070829e10d19
allowed_ext_ips : []
exempted_ext_ips : []
external_ids : {name=intm-pcpos-mbb.egressip}
external_ip : "172.18.159.22"
external_mac : []
external_port_range : ""
gateway_port : []
logical_ip : "10.240.4.54"
logical_port : k8s-iepumnosw601.epu.corpintra.net
options : {stateless="false"}
type : snat
Another thing that we see is that the SNATs only got created for a few pods and some pods in the project simply didn't get any egressIP SNAT. In total on this project there are these logical port bindings created:
addresses : ["0a:58:0a:f3:00:29 10.243.0.41"]
name : intm-pcpos-mbb_mbb-apropos-service-78586dbfc6-k8sw6
options : {iface-id-ver="759b1468-c1e9-4255-ace1-3db26a487af3", requested-chassis=iepumnosw601.epu.corpintra.net}
addresses : ["0a:58:0a:f1:02:87 10.241.2.135"]
name : intm-pcpos-mbb_mbb-antrag-neu-frontend-56d9b54c57-ccglw
options : {iface-id-ver="b2e65c4b-eb2e-4f97-88aa-10f0b08c3406", requested-chassis=iepumnosw604.epu.corpintra.net}
addresses : ["0a:58:0a:f0:04:37 10.240.4.55"]
name : intm-pcpos-mbb_mbb-antrag-verteiler-service-b79cbb8cb-ftkdg
options : {iface-id-ver="f8183495-c74f-4b4d-9ed5-837ff4173a2b", requested-chassis=iepumnosw602.epu.corpintra.net}
addresses : ["0a:58:0a:f3:00:2a 10.243.0.42"]
name : intm-pcpos-mbb_pcneo-maintenance-page-564cf7b54b-xcrxz
options : {iface-id-ver="28bc2d7a-2106-4118-b16a-215e043c6bd3", requested-chassis=iepumnosw601.epu.corpintra.net}
addresses : ["0a:58:0a:f1:04:49 10.241.4.73"]
name : intm-pcpos-mbb_mbb-antrag-gebraucht-frontend-85d9664d85-rcp79
options : {iface-id-ver="9e9bb50d-30b8-47a1-9508-232479f163cb", requested-chassis=iepumnosw605.epu.corpintra.net}
addresses : ["0a:58:0a:f1:02:85 10.241.2.133"]
name : intm-pcpos-mbb_mbb-antrag-neu-b2b-frontend-947b77f78-xfnvc
options : {iface-id-ver="6e74e70b-8751-4ee3-b11e-c515d6a791f2", requested-chassis=iepumnosw604.epu.corpintra.net}
addresses : ["0a:58:0a:f1:03:12 10.241.3.18"]
name : intm-pcpos-mbb_kong-6dff54c998-5zbc5
options : {iface-id-ver="5c25bcdf-0aab-4bbe-b5c8-afed608e13df", requested-chassis=iepumnosw604.epu.corpintra.net}
addresses : ["0a:58:0a:f3:00:dd 10.243.0.221"]
name : intm-pcpos-mbb_mbb-antrag-gebraucht-frontend-85d9664d85-g9t44
options : {iface-id-ver="1f95ec7e-9e75-4527-8c10-0d782aaa6bfd", requested-chassis=iepumnosw601.epu.corpintra.net}
addresses : ["0a:58:0a:f0:04:36 10.240.4.54"]
name : intm-pcpos-mbb_mbb-antrag-gebraucht-service-559b766bdd-g9rxb
options : {iface-id-ver="55470a8b-1811-4d44-a79e-d27148fe1568", requested-chassis=iepumnosw602.epu.corpintra.net}
addresses : ["0a:58:0a:f1:02:86 10.241.2.134"]
name : intm-pcpos-mbb_mbb-antrag-neu-b2b-service-54575fcd89-wd8vr
options : {iface-id-ver="60c4676a-494b-4b1d-b370-993160d1c198", requested-chassis=iepumnosw604.epu.corpintra.net}
addresses : ["0a:58:0a:f3:00:c0 10.243.0.192"]
name : intm-pcpos-mbb_mbb-antrag-gebraucht-service-559b766bdd-pf9lf
options : {iface-id-ver="66924473-ae8d-47fe-9cda-617fb49d6759", requested-chassis=iepumnosw601.epu.corpintra.net}
addresses : ["0a:58:0a:f1:05:0a 10.241.5.10"]
name : intm-pcpos-mbb_mbb-antrag-neu-b2b-service-54575fcd89-vcklw
options : {iface-id-ver="1677ea45-f967-480d-a406-b443cb807155", requested-chassis=iepumnosw605.epu.corpintra.net}
addresses : ["0a:58:0a:f1:05:0c 10.241.5.12"]
name : intm-pcpos-mbb_vin-service-c7bccfdc6-p9clr
options : {iface-id-ver="33ceadea-678f-4ace-a0fb-2480d25aed3b", requested-chassis=iepumnosw605.epu.corpintra.net}
addresses : ["0a:58:0a:f1:05:0b 10.241.5.11"]
name : intm-pcpos-mbb_mbb-apropos-service-78586dbfc6-pxp6s
options : {iface-id-ver="a8320aa0-5cb4-4728-a72a-0dc3d2395db4", requested-chassis=iepumnosw605.epu.corpintra.net}
addresses : ["0a:58:0a:f3:00:28 10.243.0.40"]
name : intm-pcpos-mbb_mbb-antrag-neu-frontend-56d9b54c57-9gkx8
options : {iface-id-ver="ca3c25a7-14ed-4c51-b5e2-58087c5551c8", requested-chassis=iepumnosw601.epu.corpintra.net}
addresses : ["0a:58:0a:f3:03:23 10.243.3.35"]
name : intm-pcpos-mbb_mbb-antrag-neu-service-649f9fb74-zw76x
options : {iface-id-ver="ee57e7e2-ef8c-4db2-9202-187dce1233f3", requested-chassis=iepumnosw603.epu.corpintra.net}
addresses : ["0a:58:0a:f3:00:2c 10.243.0.44"]
name : intm-pcpos-mbb_vin-service-c7bccfdc6-zkdxc
options : {iface-id-ver="b244f3e1-018c-4d6e-b576-57285d718376", requested-chassis=iepumnosw601.epu.corpintra.net}
addresses : ["0a:58:0a:f3:00:4e 10.243.0.78"]
name : intm-pcpos-mbb_pos-monitor-proxy-7c5cf95786-94c79
options : {iface-id-ver="56e302ba-1b0f-4efd-aa45-85836e326252", requested-chassis=iepumnosw601.epu.corpintra.net}
addresses : ["0a:58:0a:f1:02:88 10.241.2.136"]
name : intm-pcpos-mbb_mbb-antrag-neu-service-649f9fb74-tmc5g
options : {iface-id-ver="e66cd183-b203-43a9-b504-1a5cc752c3c8", requested-chassis=iepumnosw604.epu.corpintra.net}
addresses : ["0a:58:0a:f0:04:1b 10.240.4.27"]
name : intm-pcpos-mbb_pos-monitor-proxy-7c5cf95786-tmh46
options : {iface-id-ver="77cc2e0a-c3ef-4c2a-9b99-3726ad553e68", requested-chassis=iepumnosw602.epu.corpintra.net}
addresses : ["0a:58:0a:f1:03:0b 10.241.3.11"]
name : intm-pcpos-mbb_kong-postgres-58c6bc559c-dj9z6
options : {iface-id-ver="fcc8edda-9bd4-49d9-929e-9b149b912028", requested-chassis=iepumnosw604.epu.corpintra.net}
addresses : ["0a:58:0a:f3:00:27 10.243.0.39"]
name : intm-pcpos-mbb_mbb-antrag-neu-b2b-frontend-947b77f78-j6tz9
options : {iface-id-ver="d3e08cdb-a757-48f8-9b1b-44d9f51a4275", requested-chassis=iepumnosw601.epu.corpintra.net}
addresses : ["0a:58:0a:f1:02:89 10.241.2.137"]
name : intm-pcpos-mbb_mbb-antrag-verteiler-service-b79cbb8cb-n99fh
options : {iface-id-ver="c34af7ad-81f9-4f26-a73b-df04fcd4cb27", requested-chassis=iepumnosw604.epu.corpintra.net}
It looks like the ovnkube-master just gave up and didn't do any change or recheck.
I will share the must-gathers soon.
Version-Release number of selected component (if applicable):
4.12.40
How reproducible:
Often on the customer only
Steps to Reproduce:
Unnown
- depends on
-
OCPBUGS-35058 [release-4.14] [OVN] Stale nat rules for egressIP for large scale pods
- Closed
- links to