-
Bug
-
Resolution: Done-Errata
-
Major
-
4.13.z, 4.12, 4.14.0
-
+
-
Moderate
-
No
-
SDN Sprint 241, SDN Sprint 242
-
2
-
Approved
-
False
-
Description of problem:
Pod sometimes doesn’t work as expected when it has the same name with previous pods on OVN network cluster
Version-Release number of selected component (if applicable):
4.12.0-0.nightly-2023-09-05-064152
How reproducible:
Always, but need try more times
Steps to Reproduce:
1.Create a machineset liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml machineset.machine.openshift.io/huliu-nu96a-zn7mc-workera created liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-nu96a-zn7mc-master-0 Running AHV Unnamed Development-LTS 6h14m huliu-nu96a-zn7mc-master-1 Running AHV Unnamed Development-LTS 6h14m huliu-nu96a-zn7mc-master-2 Running AHV Unnamed Development-LTS 6h14m huliu-nu96a-zn7mc-worker-5j47v Running AHV Unnamed Development-LTS 6h9m huliu-nu96a-zn7mc-worker-thprs Running AHV Unnamed Development-LTS 6h9m huliu-nu96a-zn7mc-workera-x54mr Running AHV Unnamed Development-LTS 6m50s liuhuali@Lius-MacBook-Pro huali-test % oc get node NAME STATUS ROLES AGE VERSION huliu-nu96a-zn7mc-master-0 Ready control-plane,master 6h12m v1.25.12+26bab08 huliu-nu96a-zn7mc-master-1 Ready control-plane,master 6h12m v1.25.12+26bab08 huliu-nu96a-zn7mc-master-2 Ready control-plane,master 6h12m v1.25.12+26bab08 huliu-nu96a-zn7mc-worker-5j47v Ready worker 6h v1.25.12+26bab08 huliu-nu96a-zn7mc-worker-thprs Ready worker 6h v1.25.12+26bab08 huliu-nu96a-zn7mc-workera-x54mr Ready worker 3m7s v1.25.12+26bab08 2.Create a pod on the new node liuhuali@Lius-MacBook-Pro huali-test % oc create -f kubelet-killer2.yaml pod/kubelet-killer created liuhuali@Lius-MacBook-Pro huali-test % cat kubelet-killer2.yaml apiVersion: v1 kind: Pod metadata: labels: kubelet-killer: "" name: kubelet-killer namespace: openshift-machine-api spec: containers: - command: - pkill - -STOP - kubelet image: quay.io/openshifttest/base-alpine@sha256:3126e4eed4a3ebd8bf972b2453fa838200988ee07c01b2251e3ea47e4b1f245c imagePullPolicy: Always name: kubelet-killer securityContext: privileged: true enableServiceLinks: true hostPID: true nodeName: huliu-nu96a-zn7mc-workera-x54mr restartPolicy: Never liuhuali@Lius-MacBook-Pro huali-test % 3.The pod worked as expected liuhuali@Lius-MacBook-Pro huali-test % oc get node NAME STATUS ROLES AGE VERSION huliu-nu96a-zn7mc-master-0 Ready control-plane,master 6h13m v1.25.12+26bab08 huliu-nu96a-zn7mc-master-1 Ready control-plane,master 6h14m v1.25.12+26bab08 huliu-nu96a-zn7mc-master-2 Ready control-plane,master 6h13m v1.25.12+26bab08 huliu-nu96a-zn7mc-worker-5j47v Ready worker 6h2m v1.25.12+26bab08 huliu-nu96a-zn7mc-worker-thprs Ready worker 6h2m v1.25.12+26bab08 huliu-nu96a-zn7mc-workera-x54mr NotReady worker 4m43s v1.25.12+26bab08 liuhuali@Lius-MacBook-Pro huali-test % oc describe pod kubelet-killer Name: kubelet-killer Namespace: openshift-machine-api Priority: 0 Node: huliu-nu96a-zn7mc-workera-x54mr/10.0.132.101 Start Time: Wed, 06 Sep 2023 15:33:43 +0800 Labels: kubelet-killer= Annotations: k8s.ovn.org/pod-networks: {"default":{"ip_addresses":["10.130.8.7/23"],"mac_address":"0a:58:0a:82:08:07","gateway_ips":["10.130.8.1"],"ip_address":"10.130.8.7/23","... k8s.v1.cni.cncf.io/network-status: [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.130.8.7" ], "mac": "0a:58:0a:82:08:07", "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.130.8.7" ], "mac": "0a:58:0a:82:08:07", "default": true, "dns": {} }] openshift.io/scc: privileged Status: Pending IP: IPs: <none> Containers: kubelet-killer: Container ID: Image: quay.io/openshifttest/base-alpine@sha256:3126e4eed4a3ebd8bf972b2453fa838200988ee07c01b2251e3ea47e4b1f245c Image ID: Port: <none> Host Port: <none> Command: pkill -STOP kubelet State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nm9vd (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: kube-api-access-nm9vd: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal AddedInterface 90s multus Add eth0 [10.130.8.7/23] from ovn-kubernetes Normal Pulling 90s kubelet Pulling image "quay.io/openshifttest/base-alpine@sha256:3126e4eed4a3ebd8bf972b2453fa838200988ee07c01b2251e3ea47e4b1f245c" Normal Pulled 87s kubelet Successfully pulled image "quay.io/openshifttest/base-alpine@sha256:3126e4eed4a3ebd8bf972b2453fa838200988ee07c01b2251e3ea47e4b1f245c" in 2.310348601s (2.310355399s including waiting) Normal Created 87s kubelet Created container kubelet-killer liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-nu96a-zn7mc-master-0 Running AHV Unnamed Development-LTS 6h17m huliu-nu96a-zn7mc-master-1 Running AHV Unnamed Development-LTS 6h17m huliu-nu96a-zn7mc-master-2 Running AHV Unnamed Development-LTS 6h17m huliu-nu96a-zn7mc-worker-5j47v Running AHV Unnamed Development-LTS 6h11m huliu-nu96a-zn7mc-worker-thprs Running AHV Unnamed Development-LTS 6h11m huliu-nu96a-zn7mc-workera-x54mr Running AHV Unnamed Development-LTS 9m5s liuhuali@Lius-MacBook-Pro huali-test % oc get pod NAME READY STATUS RESTARTS AGE cluster-autoscaler-operator-854c6755f5-r9c2k 2/2 Running 0 5h41m cluster-baremetal-operator-976487bc9-7czpk 2/2 Running 0 5h41m control-plane-machine-set-operator-69684bcccd-c6jnf 1/1 Running 0 5h41m kubelet-killer 0/1 ContainerCreating 0 98s machine-api-controllers-7f574b69b5-w5swt 7/7 Running 0 155m machine-api-operator-7f46db4fcc-v6w9p 2/2 Running 0 5h41m 4.Try this once again. Delete the old machine and let it recreate a new one liuhuali@Lius-MacBook-Pro huali-test % oc delete machine huliu-nu96a-zn7mc-workera-x54mr machine.machine.openshift.io "huliu-nu96a-zn7mc-workera-x54mr" deleted liuhuali@Lius-MacBook-Pro huali-test % oc get pod NAME READY STATUS RESTARTS AGE cluster-autoscaler-operator-854c6755f5-r9c2k 2/2 Running 0 5h42m cluster-baremetal-operator-976487bc9-7czpk 2/2 Running 0 5h42m control-plane-machine-set-operator-69684bcccd-c6jnf 1/1 Running 0 5h42m kubelet-killer 0/1 Terminating 0 2m28s machine-api-controllers-7f574b69b5-w5swt 7/7 Running 0 156m machine-api-operator-7f46db4fcc-v6w9p 2/2 Running 0 5h42m liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-nu96a-zn7mc-master-0 Running AHV Unnamed Development-LTS 6h18m huliu-nu96a-zn7mc-master-1 Running AHV Unnamed Development-LTS 6h18m huliu-nu96a-zn7mc-master-2 Running AHV Unnamed Development-LTS 6h18m huliu-nu96a-zn7mc-worker-5j47v Running AHV Unnamed Development-LTS 6h12m huliu-nu96a-zn7mc-worker-thprs Running AHV Unnamed Development-LTS 6h12m huliu-nu96a-zn7mc-workera-t8dj2 Provisioning 27s liuhuali@Lius-MacBook-Pro huali-test % oc get pod NAME READY STATUS RESTARTS AGE cluster-autoscaler-operator-854c6755f5-r9c2k 2/2 Running 0 5h44m cluster-baremetal-operator-976487bc9-7czpk 2/2 Running 0 5h44m control-plane-machine-set-operator-69684bcccd-c6jnf 1/1 Running 0 5h44m machine-api-controllers-7f574b69b5-w5swt 7/7 Running 0 158m machine-api-operator-7f46db4fcc-v6w9p 2/2 Running 0 5h44m liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-nu96a-zn7mc-master-0 Running AHV Unnamed Development-LTS 6h27m huliu-nu96a-zn7mc-master-1 Running AHV Unnamed Development-LTS 6h27m huliu-nu96a-zn7mc-master-2 Running AHV Unnamed Development-LTS 6h27m huliu-nu96a-zn7mc-worker-5j47v Running AHV Unnamed Development-LTS 6h21m huliu-nu96a-zn7mc-worker-thprs Running AHV Unnamed Development-LTS 6h21m huliu-nu96a-zn7mc-workera-t8dj2 Running AHV Unnamed Development-LTS 9m46s liuhuali@Lius-MacBook-Pro huali-test % oc get node NAME STATUS ROLES AGE VERSION huliu-nu96a-zn7mc-master-0 Ready control-plane,master 6h24m v1.25.12+26bab08 huliu-nu96a-zn7mc-master-1 Ready control-plane,master 6h25m v1.25.12+26bab08 huliu-nu96a-zn7mc-master-2 Ready control-plane,master 6h24m v1.25.12+26bab08 huliu-nu96a-zn7mc-worker-5j47v Ready worker 6h13m v1.25.12+26bab08 huliu-nu96a-zn7mc-worker-thprs Ready worker 6h13m v1.25.12+26bab08 huliu-nu96a-zn7mc-workera-t8dj2 Ready worker 6m v1.25.12+26bab08 5.Create a pod with the same name as the previous one (here is kubelet-killer) on the new node liuhuali@Lius-MacBook-Pro huali-test % oc create -f kubelet-killer2.yaml pod/kubelet-killer created liuhuali@Lius-MacBook-Pro huali-test % cat kubelet-killer2.yaml apiVersion: v1 kind: Pod metadata: labels: kubelet-killer: "" name: kubelet-killer namespace: openshift-machine-api spec: containers: - command: - pkill - -STOP - kubelet image: quay.io/openshifttest/base-alpine@sha256:3126e4eed4a3ebd8bf972b2453fa838200988ee07c01b2251e3ea47e4b1f245c imagePullPolicy: Always name: kubelet-killer securityContext: privileged: true enableServiceLinks: true hostPID: true nodeName: huliu-nu96a-zn7mc-workera-t8dj2 restartPolicy: Never 6.Check the pod doesn’t work as expected. liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-nu96a-zn7mc-master-0 Running AHV Unnamed Development-LTS 6h35m huliu-nu96a-zn7mc-master-1 Running AHV Unnamed Development-LTS 6h35m huliu-nu96a-zn7mc-master-2 Running AHV Unnamed Development-LTS 6h35m huliu-nu96a-zn7mc-worker-5j47v Running AHV Unnamed Development-LTS 6h29m huliu-nu96a-zn7mc-worker-thprs Running AHV Unnamed Development-LTS 6h29m huliu-nu96a-zn7mc-workera-t8dj2 Running AHV Unnamed Development-LTS 17m liuhuali@Lius-MacBook-Pro huali-test % oc get node NAME STATUS ROLES AGE VERSION huliu-nu96a-zn7mc-master-0 Ready control-plane,master 6h32m v1.25.12+26bab08 huliu-nu96a-zn7mc-master-1 Ready control-plane,master 6h33m v1.25.12+26bab08 huliu-nu96a-zn7mc-master-2 Ready control-plane,master 6h32m v1.25.12+26bab08 huliu-nu96a-zn7mc-worker-5j47v Ready worker 6h21m v1.25.12+26bab08 huliu-nu96a-zn7mc-worker-thprs Ready worker 6h21m v1.25.12+26bab08 huliu-nu96a-zn7mc-workera-t8dj2 Ready worker 14m v1.25.12+26bab08 liuhuali@Lius-MacBook-Pro huali-test % oc get pod NAME READY STATUS RESTARTS AGE cluster-autoscaler-operator-854c6755f5-r9c2k 2/2 Running 0 6h cluster-baremetal-operator-976487bc9-7czpk 2/2 Running 0 6h control-plane-machine-set-operator-69684bcccd-c6jnf 1/1 Running 0 6h kubelet-killer 0/1 ContainerCreating 0 7m18s machine-api-controllers-7f574b69b5-w5swt 7/7 Running 0 174m machine-api-operator-7f46db4fcc-v6w9p 2/2 Running 0 6h liuhuali@Lius-MacBook-Pro huali-test % oc describe pod kubelet-killer Name: kubelet-killer Namespace: openshift-machine-api Priority: 0 Node: huliu-nu96a-zn7mc-workera-t8dj2/10.0.132.67 Start Time: Wed, 06 Sep 2023 15:46:29 +0800 Labels: kubelet-killer= Annotations: openshift.io/scc: node-exporter Status: Pending IP: IPs: <none> Containers: kubelet-killer: Container ID: Image: quay.io/openshifttest/base-alpine@sha256:3126e4eed4a3ebd8bf972b2453fa838200988ee07c01b2251e3ea47e4b1f245c Image ID: Port: <none> Host Port: <none> Command: pkill -STOP kubelet State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dcq5h (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: kube-api-access-dcq5h: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ErrorAddingLogicalPort 7m30s controlplane deleteLogicalPort failed for pod openshift-machine-api_kubelet-killer: cannot delete GR SNAT for pod openshift-machine-api/kubelet-killer: failed create operation for deleting SNAT rule for pod on gateway router GR_huliu-nu96a-zn7mc-workera-x54mr: unable to get NAT entries for router &{UUID: Copp:<nil> Enabled:<nil> ExternalIDs:map[] LoadBalancer:[] LoadBalancerGroup:[] Name:GR_huliu-nu96a-zn7mc-workera-x54mr Nat:[] Options:map[] Policies:[] Ports:[] StaticRoutes:[]}: failed to get router: GR_huliu-nu96a-zn7mc-workera-x54mr, error: object not found Warning FailedCreatePodSandBox 5m29s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kubelet-killer_openshift-machine-api_84edbe26-680b-4c50-a8a4-71ffb82b8d9c_0(c1671822d85747016e7a619891ff5981b470c268f478a761de485f3ae3a0f2ef): error adding pod openshift-machine-api_kubelet-killer to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [openshift-machine-api/kubelet-killer/84edbe26-680b-4c50-a8a4-71ffb82b8d9c:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-machine-api/kubelet-killer c1671822d85747016e7a619891ff5981b470c268f478a761de485f3ae3a0f2ef] [openshift-machine-api/kubelet-killer c1671822d85747016e7a619891ff5981b470c268f478a761de485f3ae3a0f2ef] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded ' Warning FailedCreatePodSandBox 3m17s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kubelet-killer_openshift-machine-api_84edbe26-680b-4c50-a8a4-71ffb82b8d9c_0(dced805c3e86acbf5a10a8b4efbc02c64ad3c9360e23885c4fe593ca198f43b0): error adding pod openshift-machine-api_kubelet-killer to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [openshift-machine-api/kubelet-killer/84edbe26-680b-4c50-a8a4-71ffb82b8d9c:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-machine-api/kubelet-killer dced805c3e86acbf5a10a8b4efbc02c64ad3c9360e23885c4fe593ca198f43b0] [openshift-machine-api/kubelet-killer dced805c3e86acbf5a10a8b4efbc02c64ad3c9360e23885c4fe593ca198f43b0] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded ' Warning FailedCreatePodSandBox 65s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kubelet-killer_openshift-machine-api_84edbe26-680b-4c50-a8a4-71ffb82b8d9c_0(4bbf45588909933b9c4086274a08b7cddc2e09fe47e740ee14c74523f4f21ef2): error adding pod openshift-machine-api_kubelet-killer to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [openshift-machine-api/kubelet-killer/84edbe26-680b-4c50-a8a4-71ffb82b8d9c:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-machine-api/kubelet-killer 4bbf45588909933b9c4086274a08b7cddc2e09fe47e740ee14c74523f4f21ef2] [openshift-machine-api/kubelet-killer 4bbf45588909933b9c4086274a08b7cddc2e09fe47e740ee14c74523f4f21ef2] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded ' In the Warning Events it shows “GR_huliu-nu96a-zn7mc-workera-x54mr”, but huliu-nu96a-zn7mc-workera-x54mr is the previous node, I created the pod on huliu-nu96a-zn7mc-workera-t8dj2 in Step 5. If create the new pod with different name, there is no such issue.
Actual results:
The pod doesn’t worked as expected when it has the same name with previous pods.
Expected results:
The pod should worked as expected even it has the same name with previous pods.
Additional info:
The same case worked as expected on SDN network cluster. Discussion in slack https://redhat-internal.slack.com/archives/CH76YSYSC/p1693983428736929
- depends on
-
OCPBUGS-18895 Pod sometimes doesn’t work as expected when it has the same name with previous pods on OVN network cluster
- Closed
- is cloned by
-
OCPBUGS-18672 Pod sometimes doesn’t work as expected when it has the same name with previous pods on OVN network cluster
- Closed
-
OCPBUGS-18895 Pod sometimes doesn’t work as expected when it has the same name with previous pods on OVN network cluster
- Closed
- is depended on by
-
OCPBUGS-18672 Pod sometimes doesn’t work as expected when it has the same name with previous pods on OVN network cluster
- Closed
- links to
-
RHSA-2023:5006 OpenShift Container Platform 4.14.z security update