-
Bug
-
Resolution: Done-Errata
-
Major
-
4.12.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
No
-
None
-
Rejected
-
SDN Sprint 242, SDN Sprint 243
-
2
-
+
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Pod sometimes doesn’t work as expected when it has the same name with previous pods on OVN network cluster
Version-Release number of selected component (if applicable):
4.12.0-0.nightly-2023-09-05-064152
How reproducible:
Always, but need try more times
Steps to Reproduce:
1.Create a machineset
liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml
machineset.machine.openshift.io/huliu-nu96a-zn7mc-workera created
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME PHASE TYPE REGION ZONE AGE
huliu-nu96a-zn7mc-master-0 Running AHV Unnamed Development-LTS 6h14m
huliu-nu96a-zn7mc-master-1 Running AHV Unnamed Development-LTS 6h14m
huliu-nu96a-zn7mc-master-2 Running AHV Unnamed Development-LTS 6h14m
huliu-nu96a-zn7mc-worker-5j47v Running AHV Unnamed Development-LTS 6h9m
huliu-nu96a-zn7mc-worker-thprs Running AHV Unnamed Development-LTS 6h9m
huliu-nu96a-zn7mc-workera-x54mr Running AHV Unnamed Development-LTS 6m50s
liuhuali@Lius-MacBook-Pro huali-test % oc get node
NAME STATUS ROLES AGE VERSION
huliu-nu96a-zn7mc-master-0 Ready control-plane,master 6h12m v1.25.12+26bab08
huliu-nu96a-zn7mc-master-1 Ready control-plane,master 6h12m v1.25.12+26bab08
huliu-nu96a-zn7mc-master-2 Ready control-plane,master 6h12m v1.25.12+26bab08
huliu-nu96a-zn7mc-worker-5j47v Ready worker 6h v1.25.12+26bab08
huliu-nu96a-zn7mc-worker-thprs Ready worker 6h v1.25.12+26bab08
huliu-nu96a-zn7mc-workera-x54mr Ready worker 3m7s v1.25.12+26bab08
2.Create a pod on the new node
liuhuali@Lius-MacBook-Pro huali-test % oc create -f kubelet-killer2.yaml
pod/kubelet-killer created
liuhuali@Lius-MacBook-Pro huali-test % cat kubelet-killer2.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
kubelet-killer: ""
name: kubelet-killer
namespace: openshift-machine-api
spec:
containers:
- command:
- pkill
- -STOP
- kubelet
image: quay.io/openshifttest/base-alpine@sha256:3126e4eed4a3ebd8bf972b2453fa838200988ee07c01b2251e3ea47e4b1f245c
imagePullPolicy: Always
name: kubelet-killer
securityContext:
privileged: true
enableServiceLinks: true
hostPID: true
nodeName: huliu-nu96a-zn7mc-workera-x54mr
restartPolicy: Never
liuhuali@Lius-MacBook-Pro huali-test %
3.The pod worked as expected
liuhuali@Lius-MacBook-Pro huali-test % oc get node
NAME STATUS ROLES AGE VERSION
huliu-nu96a-zn7mc-master-0 Ready control-plane,master 6h13m v1.25.12+26bab08
huliu-nu96a-zn7mc-master-1 Ready control-plane,master 6h14m v1.25.12+26bab08
huliu-nu96a-zn7mc-master-2 Ready control-plane,master 6h13m v1.25.12+26bab08
huliu-nu96a-zn7mc-worker-5j47v Ready worker 6h2m v1.25.12+26bab08
huliu-nu96a-zn7mc-worker-thprs Ready worker 6h2m v1.25.12+26bab08
huliu-nu96a-zn7mc-workera-x54mr NotReady worker 4m43s v1.25.12+26bab08
liuhuali@Lius-MacBook-Pro huali-test % oc describe pod kubelet-killer
Name: kubelet-killer
Namespace: openshift-machine-api
Priority: 0
Node: huliu-nu96a-zn7mc-workera-x54mr/10.0.132.101
Start Time: Wed, 06 Sep 2023 15:33:43 +0800
Labels: kubelet-killer=
Annotations: k8s.ovn.org/pod-networks:
{"default":{"ip_addresses":["10.130.8.7/23"],"mac_address":"0a:58:0a:82:08:07","gateway_ips":["10.130.8.1"],"ip_address":"10.130.8.7/23","...
k8s.v1.cni.cncf.io/network-status:
[{
"name": "ovn-kubernetes",
"interface": "eth0",
"ips": [
"10.130.8.7"
],
"mac": "0a:58:0a:82:08:07",
"default": true,
"dns": {}
}]
k8s.v1.cni.cncf.io/networks-status:
[{
"name": "ovn-kubernetes",
"interface": "eth0",
"ips": [
"10.130.8.7"
],
"mac": "0a:58:0a:82:08:07",
"default": true,
"dns": {}
}]
openshift.io/scc: privileged
Status: Pending
IP:
IPs: <none>
Containers:
kubelet-killer:
Container ID:
Image: quay.io/openshifttest/base-alpine@sha256:3126e4eed4a3ebd8bf972b2453fa838200988ee07c01b2251e3ea47e4b1f245c
Image ID:
Port: <none>
Host Port: <none>
Command:
pkill
-STOP
kubelet
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nm9vd (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-nm9vd:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal AddedInterface 90s multus Add eth0 [10.130.8.7/23] from ovn-kubernetes
Normal Pulling 90s kubelet Pulling image "quay.io/openshifttest/base-alpine@sha256:3126e4eed4a3ebd8bf972b2453fa838200988ee07c01b2251e3ea47e4b1f245c"
Normal Pulled 87s kubelet Successfully pulled image "quay.io/openshifttest/base-alpine@sha256:3126e4eed4a3ebd8bf972b2453fa838200988ee07c01b2251e3ea47e4b1f245c" in 2.310348601s (2.310355399s including waiting)
Normal Created 87s kubelet Created container kubelet-killer
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME PHASE TYPE REGION ZONE AGE
huliu-nu96a-zn7mc-master-0 Running AHV Unnamed Development-LTS 6h17m
huliu-nu96a-zn7mc-master-1 Running AHV Unnamed Development-LTS 6h17m
huliu-nu96a-zn7mc-master-2 Running AHV Unnamed Development-LTS 6h17m
huliu-nu96a-zn7mc-worker-5j47v Running AHV Unnamed Development-LTS 6h11m
huliu-nu96a-zn7mc-worker-thprs Running AHV Unnamed Development-LTS 6h11m
huliu-nu96a-zn7mc-workera-x54mr Running AHV Unnamed Development-LTS 9m5s
liuhuali@Lius-MacBook-Pro huali-test % oc get pod
NAME READY STATUS RESTARTS AGE
cluster-autoscaler-operator-854c6755f5-r9c2k 2/2 Running 0 5h41m
cluster-baremetal-operator-976487bc9-7czpk 2/2 Running 0 5h41m
control-plane-machine-set-operator-69684bcccd-c6jnf 1/1 Running 0 5h41m
kubelet-killer 0/1 ContainerCreating 0 98s
machine-api-controllers-7f574b69b5-w5swt 7/7 Running 0 155m
machine-api-operator-7f46db4fcc-v6w9p 2/2 Running 0 5h41m
4.Try this once again. Delete the old machine and let it recreate a new one
liuhuali@Lius-MacBook-Pro huali-test % oc delete machine huliu-nu96a-zn7mc-workera-x54mr
machine.machine.openshift.io "huliu-nu96a-zn7mc-workera-x54mr" deleted
liuhuali@Lius-MacBook-Pro huali-test % oc get pod
NAME READY STATUS RESTARTS AGE
cluster-autoscaler-operator-854c6755f5-r9c2k 2/2 Running 0 5h42m
cluster-baremetal-operator-976487bc9-7czpk 2/2 Running 0 5h42m
control-plane-machine-set-operator-69684bcccd-c6jnf 1/1 Running 0 5h42m
kubelet-killer 0/1 Terminating 0 2m28s
machine-api-controllers-7f574b69b5-w5swt 7/7 Running 0 156m
machine-api-operator-7f46db4fcc-v6w9p 2/2 Running 0 5h42m
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME PHASE TYPE REGION ZONE AGE
huliu-nu96a-zn7mc-master-0 Running AHV Unnamed Development-LTS 6h18m
huliu-nu96a-zn7mc-master-1 Running AHV Unnamed Development-LTS 6h18m
huliu-nu96a-zn7mc-master-2 Running AHV Unnamed Development-LTS 6h18m
huliu-nu96a-zn7mc-worker-5j47v Running AHV Unnamed Development-LTS 6h12m
huliu-nu96a-zn7mc-worker-thprs Running AHV Unnamed Development-LTS 6h12m
huliu-nu96a-zn7mc-workera-t8dj2 Provisioning 27s
liuhuali@Lius-MacBook-Pro huali-test % oc get pod
NAME READY STATUS RESTARTS AGE
cluster-autoscaler-operator-854c6755f5-r9c2k 2/2 Running 0 5h44m
cluster-baremetal-operator-976487bc9-7czpk 2/2 Running 0 5h44m
control-plane-machine-set-operator-69684bcccd-c6jnf 1/1 Running 0 5h44m
machine-api-controllers-7f574b69b5-w5swt 7/7 Running 0 158m
machine-api-operator-7f46db4fcc-v6w9p 2/2 Running 0 5h44m
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME PHASE TYPE REGION ZONE AGE
huliu-nu96a-zn7mc-master-0 Running AHV Unnamed Development-LTS 6h27m
huliu-nu96a-zn7mc-master-1 Running AHV Unnamed Development-LTS 6h27m
huliu-nu96a-zn7mc-master-2 Running AHV Unnamed Development-LTS 6h27m
huliu-nu96a-zn7mc-worker-5j47v Running AHV Unnamed Development-LTS 6h21m
huliu-nu96a-zn7mc-worker-thprs Running AHV Unnamed Development-LTS 6h21m
huliu-nu96a-zn7mc-workera-t8dj2 Running AHV Unnamed Development-LTS 9m46s
liuhuali@Lius-MacBook-Pro huali-test % oc get node
NAME STATUS ROLES AGE VERSION
huliu-nu96a-zn7mc-master-0 Ready control-plane,master 6h24m v1.25.12+26bab08
huliu-nu96a-zn7mc-master-1 Ready control-plane,master 6h25m v1.25.12+26bab08
huliu-nu96a-zn7mc-master-2 Ready control-plane,master 6h24m v1.25.12+26bab08
huliu-nu96a-zn7mc-worker-5j47v Ready worker 6h13m v1.25.12+26bab08
huliu-nu96a-zn7mc-worker-thprs Ready worker 6h13m v1.25.12+26bab08
huliu-nu96a-zn7mc-workera-t8dj2 Ready worker 6m v1.25.12+26bab08
5.Create a pod with the same name as the previous one (here is kubelet-killer) on the new node
liuhuali@Lius-MacBook-Pro huali-test % oc create -f kubelet-killer2.yaml
pod/kubelet-killer created
liuhuali@Lius-MacBook-Pro huali-test % cat kubelet-killer2.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
kubelet-killer: ""
name: kubelet-killer
namespace: openshift-machine-api
spec:
containers:
- command:
- pkill
- -STOP
- kubelet
image: quay.io/openshifttest/base-alpine@sha256:3126e4eed4a3ebd8bf972b2453fa838200988ee07c01b2251e3ea47e4b1f245c
imagePullPolicy: Always
name: kubelet-killer
securityContext:
privileged: true
enableServiceLinks: true
hostPID: true
nodeName: huliu-nu96a-zn7mc-workera-t8dj2
restartPolicy: Never
6.Check the pod doesn’t work as expected.
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME PHASE TYPE REGION ZONE AGE
huliu-nu96a-zn7mc-master-0 Running AHV Unnamed Development-LTS 6h35m
huliu-nu96a-zn7mc-master-1 Running AHV Unnamed Development-LTS 6h35m
huliu-nu96a-zn7mc-master-2 Running AHV Unnamed Development-LTS 6h35m
huliu-nu96a-zn7mc-worker-5j47v Running AHV Unnamed Development-LTS 6h29m
huliu-nu96a-zn7mc-worker-thprs Running AHV Unnamed Development-LTS 6h29m
huliu-nu96a-zn7mc-workera-t8dj2 Running AHV Unnamed Development-LTS 17m
liuhuali@Lius-MacBook-Pro huali-test % oc get node
NAME STATUS ROLES AGE VERSION
huliu-nu96a-zn7mc-master-0 Ready control-plane,master 6h32m v1.25.12+26bab08
huliu-nu96a-zn7mc-master-1 Ready control-plane,master 6h33m v1.25.12+26bab08
huliu-nu96a-zn7mc-master-2 Ready control-plane,master 6h32m v1.25.12+26bab08
huliu-nu96a-zn7mc-worker-5j47v Ready worker 6h21m v1.25.12+26bab08
huliu-nu96a-zn7mc-worker-thprs Ready worker 6h21m v1.25.12+26bab08
huliu-nu96a-zn7mc-workera-t8dj2 Ready worker 14m v1.25.12+26bab08
liuhuali@Lius-MacBook-Pro huali-test % oc get pod
NAME READY STATUS RESTARTS AGE
cluster-autoscaler-operator-854c6755f5-r9c2k 2/2 Running 0 6h
cluster-baremetal-operator-976487bc9-7czpk 2/2 Running 0 6h
control-plane-machine-set-operator-69684bcccd-c6jnf 1/1 Running 0 6h
kubelet-killer 0/1 ContainerCreating 0 7m18s
machine-api-controllers-7f574b69b5-w5swt 7/7 Running 0 174m
machine-api-operator-7f46db4fcc-v6w9p 2/2 Running 0 6h
liuhuali@Lius-MacBook-Pro huali-test % oc describe pod kubelet-killer
Name: kubelet-killer
Namespace: openshift-machine-api
Priority: 0
Node: huliu-nu96a-zn7mc-workera-t8dj2/10.0.132.67
Start Time: Wed, 06 Sep 2023 15:46:29 +0800
Labels: kubelet-killer=
Annotations: openshift.io/scc: node-exporter
Status: Pending
IP:
IPs: <none>
Containers:
kubelet-killer:
Container ID:
Image: quay.io/openshifttest/base-alpine@sha256:3126e4eed4a3ebd8bf972b2453fa838200988ee07c01b2251e3ea47e4b1f245c
Image ID:
Port: <none>
Host Port: <none>
Command:
pkill
-STOP
kubelet
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dcq5h (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-dcq5h:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ErrorAddingLogicalPort 7m30s controlplane deleteLogicalPort failed for pod openshift-machine-api_kubelet-killer: cannot delete GR SNAT for pod openshift-machine-api/kubelet-killer: failed create operation for deleting SNAT rule for pod on gateway router GR_huliu-nu96a-zn7mc-workera-x54mr: unable to get NAT entries for router &{UUID: Copp:<nil> Enabled:<nil> ExternalIDs:map[] LoadBalancer:[] LoadBalancerGroup:[] Name:GR_huliu-nu96a-zn7mc-workera-x54mr Nat:[] Options:map[] Policies:[] Ports:[] StaticRoutes:[]}: failed to get router: GR_huliu-nu96a-zn7mc-workera-x54mr, error: object not found
Warning FailedCreatePodSandBox 5m29s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kubelet-killer_openshift-machine-api_84edbe26-680b-4c50-a8a4-71ffb82b8d9c_0(c1671822d85747016e7a619891ff5981b470c268f478a761de485f3ae3a0f2ef): error adding pod openshift-machine-api_kubelet-killer to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [openshift-machine-api/kubelet-killer/84edbe26-680b-4c50-a8a4-71ffb82b8d9c:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-machine-api/kubelet-killer c1671822d85747016e7a619891ff5981b470c268f478a761de485f3ae3a0f2ef] [openshift-machine-api/kubelet-killer c1671822d85747016e7a619891ff5981b470c268f478a761de485f3ae3a0f2ef] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
'
Warning FailedCreatePodSandBox 3m17s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kubelet-killer_openshift-machine-api_84edbe26-680b-4c50-a8a4-71ffb82b8d9c_0(dced805c3e86acbf5a10a8b4efbc02c64ad3c9360e23885c4fe593ca198f43b0): error adding pod openshift-machine-api_kubelet-killer to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [openshift-machine-api/kubelet-killer/84edbe26-680b-4c50-a8a4-71ffb82b8d9c:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-machine-api/kubelet-killer dced805c3e86acbf5a10a8b4efbc02c64ad3c9360e23885c4fe593ca198f43b0] [openshift-machine-api/kubelet-killer dced805c3e86acbf5a10a8b4efbc02c64ad3c9360e23885c4fe593ca198f43b0] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
'
Warning FailedCreatePodSandBox 65s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kubelet-killer_openshift-machine-api_84edbe26-680b-4c50-a8a4-71ffb82b8d9c_0(4bbf45588909933b9c4086274a08b7cddc2e09fe47e740ee14c74523f4f21ef2): error adding pod openshift-machine-api_kubelet-killer to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [openshift-machine-api/kubelet-killer/84edbe26-680b-4c50-a8a4-71ffb82b8d9c:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-machine-api/kubelet-killer 4bbf45588909933b9c4086274a08b7cddc2e09fe47e740ee14c74523f4f21ef2] [openshift-machine-api/kubelet-killer 4bbf45588909933b9c4086274a08b7cddc2e09fe47e740ee14c74523f4f21ef2] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
'
In the Warning Events it shows “GR_huliu-nu96a-zn7mc-workera-x54mr”, but huliu-nu96a-zn7mc-workera-x54mr is the previous node, I created the pod on huliu-nu96a-zn7mc-workera-t8dj2 in Step 5.
If create the new pod with different name, there is no such issue.
Actual results:
The pod doesn’t worked as expected when it has the same name with previous pods.
Expected results:
The pod should worked as expected even it has the same name with previous pods.
Additional info:
The same case worked as expected on SDN network cluster. Discussion in slack https://redhat-internal.slack.com/archives/CH76YSYSC/p1693983428736929
- clones
-
OCPBUGS-18672 Pod sometimes doesn’t work as expected when it has the same name with previous pods on OVN network cluster
-
- Closed
-
- depends on
-
OCPBUGS-18672 Pod sometimes doesn’t work as expected when it has the same name with previous pods on OVN network cluster
-
- Closed
-
- duplicates
-
OCPBUGS-19544 Pods crashlooping and stuck in init
-
- Closed
-
-
OCPBUGS-20432 error adding container to network "ovn-kubernetes" with no degraded clusteroperators
-
- Closed
-
- links to
-
RHBA-2023:6126
OpenShift Container Platform 4.12.z bug fix update