-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.14
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
OCP 4.14.22 - Created a Network Attachment Definition for ODF Networking, defined a MacVLAN using Whereabouts. Defined the interface config using the network Operator spec, so a reconciler has been created. Observed that there are two pods with the same IP address managed by net1. this is causing packet loss and traffic interruptions intermittently between ODF peers.
Version-Release number of selected component (if applicable):
4.14.22
How reproducible:
one time - customer env, no internal repo available
Steps to Reproduce:
MacVLAN NAD definition: fi-918:~$ oc get net-attach-def -n openshift-storage -o yaml apiVersion: v1 items: - apiVersion: k8s.cni.cncf.io/v1 kind: NetworkAttachmentDefinition metadata: creationTimestamp: "2024-07-31T19:50:46Z" generation: 1 name: ocs-cluster-network namespace: openshift-storage resourceVersion: "8297200" uid: 13516483-ee53-45db-b856-f8d593be7bdf spec: config: |- { "cniVersion": "0.3.1", "type": "macvlan", "master": "tenant-vlan.98", "mode": "bridge", "ipam": { "type": "whereabouts", "range": "192.168.255.0/24" } } kind: List metadata: resourceVersion: "" //Pod IPs that are colliding: OSD-103 on storage-0 Name: rook-ceph-osd-103-9cfb79ddf-rg8gm topology-location-host=storage-0-<redacted> "name": "openshift-storage/ocs-cluster-network", "interface": "net1", "ips": [ "192.168.255.8" ], "mac": "1e:a0:ce:4d:3c:8f", "dns": {} }] & OSD-71 on storage-1 Name: rook-ceph-osd-71-5ff7687d79-8fhcn topology-location-host=storage-1-<redacted> },{ "name": "openshift-storage/ocs-cluster-network", "interface": "net1", "ips": [ "192.168.255.8" ], "mac": "f2:3b:2b:04:be:b1", "dns": {} }]
Actual results:
pods are intermittently dropping packets - we observe the following arp queries also: 1094967 Who has 192.168.255.34? Tell 192.168.255.8 (duplicate use of 192.168.255.8 detected!) conversation between .34 and .8 is interrupted - we see packets are periodically dropped/retransmitted when routed to .8 (likely being sent to the wrong backend).
Expected results:
multus should validate and prevent duplicate IP binding from whereabout pools on multiple hosts. Reconciliation is also not catching/cleaning this up.
Additional info:
This network is required for storage handling. Workaround exists: forcibly delete + clean up duplicate IP pods to force new IP allocation and resolve overlap.