-
Bug
-
Resolution: Done
-
Major
-
None
-
4.19.0
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
We are observing CI failures where a pod doesn't become ready in two minutes.
This looks to be caused by the fact that CRI-O takes a long time to call the CNI after it started the pod sandbox:
namespace/pod: e2e-test-endpointslices-mirror-e2e-xkpr2/backend-1
node: ci-op-j1hfshlr-5a3cc-xdxwn-worker-b-2vgbh
journal logs for the affected pod:
Jan 07 14:48:02.576356 ci-op-j1hfshlr-5a3cc-xdxwn-worker-b-2vgbh kubenswrapper[2349]: I0107 14:48:02.576297 2349 kubelet.go:2421] "SyncLoop ADD" source="api" pods=["e2e-test-endpointslices-mirror-e2e-xkpr2/backend-1"] Jan 07 14:48:02.582182 ci-op-j1hfshlr-5a3cc-xdxwn-worker-b-2vgbh kubenswrapper[2349]: I0107 14:48:02.581950 2349 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="e2e-test-endpointslices-mirror-e2e-xkpr2/backend-1" Jan 07 14:48:02.629312 ci-op-j1hfshlr-5a3cc-xdxwn-worker-b-2vgbh kubenswrapper[2349]: I0107 14:48:02.627694 2349 kubelet.go:2428] "SyncLoop UPDATE" source="api" pods=["e2e-test-endpointslices-mirror-e2e-xkpr2/backend-1"] Jan 07 14:48:02.731471 ci-op-j1hfshlr-5a3cc-xdxwn-worker-b-2vgbh kubenswrapper[2349]: I0107 14:48:02.731392 2349 reconciler_common.go:245] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-jgj55\" (UniqueName: \"kubernetes.io/projected/290c600b-130a-4573-898e-356c9b572b26-kube-api-access-jgj55\") pod \"backend-1\" (UID: \"290c600b-130a-4573-898e-356c9b572b26\") " pod="e2e-test-endpointslices-mirror-e2e-xkpr2/backend-1" Jan 07 14:48:02.890515 ci-op-j1hfshlr-5a3cc-xdxwn-worker-b-2vgbh kubenswrapper[2349]: I0107 14:48:02.887526 2349 kubelet.go:2428] "SyncLoop UPDATE" source="api" pods=["e2e-test-endpointslices-mirror-e2e-xkpr2/backend-1"] Jan 07 14:48:03.017712 ci-op-j1hfshlr-5a3cc-xdxwn-worker-b-2vgbh kubenswrapper[2349]: I0107 14:48:03.017563 2349 reconciler_common.go:218] "operationExecutor.MountVolume started for volume \"kube-api-access-jgj55\" (UniqueName: \"kubernetes.io/projected/290c600b-130a-4573-898e-356c9b572b26-kube-api-access-jgj55\") pod \"backend-1\" (UID: \"290c600b-130a-4573-898e-356c9b572b26\") " pod="e2e-test-endpointslices-mirror-e2e-xkpr2/backend-1" Jan 07 14:48:03.602767 ci-op-j1hfshlr-5a3cc-xdxwn-worker-b-2vgbh kubenswrapper[2349]: I0107 14:48:03.602725 2349 operation_generator.go:637] "MountVolume.SetUp succeeded for volume \"kube-api-access-jgj55\" (UniqueName: \"kubernetes.io/projected/290c600b-130a-4573-898e-356c9b572b26-kube-api-access-jgj55\") pod \"backend-1\" (UID: \"290c600b-130a-4573-898e-356c9b572b26\") " pod="e2e-test-endpointslices-mirror-e2e-xkpr2/backend-1" Jan 07 14:48:07.629929 ci-op-j1hfshlr-5a3cc-xdxwn-worker-b-2vgbh kubenswrapper[2349]: I0107 14:48:07.623891 2349 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="e2e-test-endpointslices-mirror-e2e-xkpr2/backend-1" Jan 07 14:48:07.796692 ci-op-j1hfshlr-5a3cc-xdxwn-worker-b-2vgbh crio[2317]: time="2025-01-07 14:48:07.788297343Z" level=info msg="Running pod sandbox: e2e-test-endpointslices-mirror-e2e-xkpr2/backend-1/POD" id=45b81d06-c9ff-4206-a5f8-cc79431bc78d name=/runtime.v1.RuntimeService/RunPodSandbox Jan 07 14:49:34.926581 ci-op-j1hfshlr-5a3cc-xdxwn-worker-b-2vgbh crio[2317]: time="2025-01-07 14:49:34.919856848Z" level=info msg="Got pod network &{Name:backend-1 Namespace:e2e-test-endpointslices-mirror-e2e-xkpr2 ID:b8c0fc452705aac68f281fc7e4e337602aa77df8b0ddedf2b17539cc50e7e1b7 UID:290c600b-130a-4573-898e-356c9b572b26 NetNS:/var/run/netns/df611b18-48e8-4dfe-9096-ba74a2d12cff Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[] CgroupPath:kubepods-besteffort-pod290c600b_130a_4573_898e_356c9b572b26.slice PodAnnotations:0xc0013182b0}] Aliases:map[]}" Jan 07 14:49:34.926581 ci-op-j1hfshlr-5a3cc-xdxwn-worker-b-2vgbh crio[2317]: time="2025-01-07 14:49:34.919933628Z" level=info msg="Adding pod e2e-test-endpointslices-mirror-e2e-xkpr2_backend-1 to CNI network \"multus-cni-network\" (type=multus-shim)" Jan 07 14:49:37.990628 ci-op-j1hfshlr-5a3cc-xdxwn-worker-b-2vgbh crio[2317]: time="2025-01-07 14:49:37.988645879Z" level=info msg="Got pod network &{Name:backend-1 Namespace:e2e-test-endpointslices-mirror-e2e-xkpr2 ID:b8c0fc452705aac68f281fc7e4e337602aa77df8b0ddedf2b17539cc50e7e1b7 UID:290c600b-130a-4573-898e-356c9b572b26 NetNS:/var/run/netns/df611b18-48e8-4dfe-9096-ba74a2d12cff Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[] CgroupPath:kubepods-besteffort-pod290c600b_130a_4573_898e_356c9b572b26.slice PodAnnotations:0xc0013182b0}] Aliases:map[]}" Jan 07 14:49:37.990628 ci-op-j1hfshlr-5a3cc-xdxwn-worker-b-2vgbh crio[2317]: time="2025-01-07 14:49:37.988900801Z" level=info msg="Checking pod e2e-test-endpointslices-mirror-e2e-xkpr2_backend-1 for CNI network multus-cni-network (type=multus-shim)" Jan 07 14:49:38.117346 ci-op-j1hfshlr-5a3cc-xdxwn-worker-b-2vgbh kubenswrapper[2349]: I0107 14:49:38.079718 2349 kubelet.go:2428] "SyncLoop UPDATE" source="api" pods=["e2e-test-endpointslices-mirror-e2e-xkpr2/backend-1"] Jan 07 14:49:55.282006 ci-op-j1hfshlr-5a3cc-xdxwn-worker-b-2vgbh crio[2317]: time="2025-01-07 14:49:55.281939847Z" level=info msg="Ran pod sandbox b8c0fc452705aac68f281fc7e4e337602aa77df8b0ddedf2b17539cc50e7e1b7 with infra container: e2e-test-endpointslices-mirror-e2e-xkpr2/backend-1/POD" id=45b81d06-c9ff-4206-a5f8-cc79431bc78d name=/runtime.v1.RuntimeService/RunPodSandbox Jan 07 14:49:55.924256 ci-op-j1hfshlr-5a3cc-xdxwn-worker-b-2vgbh crio[2317]: time="2025-01-07 14:49:55.817208165Z" level=info msg="Creating container: e2e-test-endpointslices-mirror-e2e-xkpr2/backend-1/agnhost-container" id=f9cc76b0-646c-4699-815a-533ac4d29b8c name=/runtime.v1.RuntimeService/CreateContainer Jan 07 14:49:58.617721 ci-op-j1hfshlr-5a3cc-xdxwn-worker-b-2vgbh crio[2317]: time="2025-01-07 14:49:58.616840803Z" level=info msg="Created container 91c2bdbace5a59a559858c483d76410505b22e02a2bff6ae80eeb3115f1cdeec: e2e-test-endpointslices-mirror-e2e-xkpr2/backend-1/agnhost-container" id=f9cc76b0-646c-4699-815a-533ac4d29b8c name=/runtime.v1.RuntimeService/CreateContainer Jan 07 14:49:59.083758 ci-op-j1hfshlr-5a3cc-xdxwn-worker-b-2vgbh crio[2317]: time="2025-01-07 14:49:59.076367235Z" level=info msg="Started container" PID=187918 containerID=91c2bdbace5a59a559858c483d76410505b22e02a2bff6ae80eeb3115f1cdeec description=e2e-test-endpointslices-mirror-e2e-xkpr2/backend-1/agnhost-container id=f74ca14b-3ffe-47db-9e40-3c36c6d03a02 name=/runtime.v1.RuntimeService/StartContainer sandboxID=b8c0fc452705aac68f281fc7e4e337602aa77df8b0ddedf2b17539cc50e7e1b7
Other examples:
slack thread:
https://redhat-internal.slack.com/archives/C01T0BK124T/p1736514156203329
This issue is causing the NetworkSegmentation tests to flake which affects our ability to GA the feature in time.