-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
4.21
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Low
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem
Seen in 4.21 CI:
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-multiarch-master-nightly-4.21-ocp-e2e-upgrade-aws-ovn-arm64/1983414271299555328/artifacts/ocp-e2e-upgrade-aws-ovn-arm64/gather-extra/artifacts/pods.json | jq '.items[] | select(.metadata.name == "pod-network-to-pod-network-disruption-poller-564f6885f5-q9qjp") | {metadata: (.metadata | {creationTimestamp, deletionTimestamp}), status}'
{
"metadata": {
"creationTimestamp": "2025-10-29T07:18:28Z",
"deletionTimestamp": "2025-10-29T08:19:53Z"
},
"status": {
"conditions": [
{
"lastProbeTime": null,
"lastTransitionTime": "2025-10-29T08:18:42Z",
"message": "Eviction API: evicting",
"reason": "EvictionByEvictionAPI",
"status": "True",
"type": "DisruptionTarget"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2025-10-29T07:18:28Z",
"observedGeneration": 1,
"status": "False",
"type": "PodReadyToStartContainers"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2025-10-29T07:18:28Z",
"observedGeneration": 1,
"status": "True",
"type": "Initialized"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2025-10-29T07:18:28Z",
"message": "containers with unready status: [disruption-poller]",
"observedGeneration": 1,
"reason": "ContainersNotReady",
"status": "False",
"type": "Ready"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2025-10-29T07:18:28Z",
"message": "containers with unready status: [disruption-poller]",
"observedGeneration": 1,
"reason": "ContainersNotReady",
"status": "False",
"type": "ContainersReady"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2025-10-29T07:18:28Z",
"observedGeneration": 1,
"status": "True",
"type": "PodScheduled"
}
],
"containerStatuses": [
{
"image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2e65bfb548f2a436a57b61ab5485ce145e88ee7c97c0834b48b4733843295fca",
"imageID": "",
"lastState": {},
"name": "disruption-poller",
"ready": false,
"restartCount": 0,
"started": false,
"state": {
"waiting": {
"reason": "ContainerCreating"
}
},
"volumeMounts": [
{
"mountPath": "/var/log/persistent-logs",
"name": "persistent-log-dir"
},
{
"mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
"name": "kube-api-access-x7tqq",
"readOnly": true,
"recursiveReadOnly": "Disabled"
}
]
}
],
"hostIP": "10.0.73.59",
"hostIPs": [
{
"ip": "10.0.73.59"
}
],
"observedGeneration": 1,
"phase": "Pending",
"qosClass": "BestEffort",
"startTime": "2025-10-29T07:18:28Z"
}
}
Version-Release number of selected component
Seen in 4.21. Unclear how this issue presents in 4.20 and earlier.
How reproducible
Unclear.
Steps to Reproduce
1. Run lots of CI.
2. Have some Pods gets stuck in ContainerCreating.
3. Have non-experts like me try to understand where in the process they got stuck using Pod status.
Actual results
None of the conditions seems to talk clearly about what the next step is on the way to Ready=True.
Expected results
Clear messaging about what we're waiting for, and what we're seeing instead. Having a message on PodReadyToStartContainers might be a good next step.
Additional info
KubeContainerWaiting is limited to OCP-core namespaces, so it doesn't cover e2e namespaces like e2e-pod-network-disruption-test-sc25z. But riffing on that metric in PromeCIeus:
max by (namespace, pod, container, reason) (
kube_pod_container_status_waiting_reason{reason!="CrashLoopBackOff", job="kube-state-metrics"} > 0
*
(kube_pod_container_status_waiting_reason{reason!="CrashLoopBackOff", job="kube-state-metrics"} offset 10m > 0)
)
turns up two Pods that stuck this way for at least 15m (one eventually recovered, or was successfully deleted):
{container="disruption-poller", namespace="e2e-pod-network-disruption-test-sc25z", pod="pod-network-to-host-network-disruption-poller-565bbc6fc-w8swj", reason="ContainerCreating"}
{container="disruption-poller", namespace="e2e-pod-network-disruption-test-sc25z", pod="pod-network-to-pod-network-disruption-poller-564f6885f5-q9qjp", reason="ContainerCreating"}
Shipping a KubeContainerWaiting runbook might be another way to make this kind of issue more debuggable.