-
Bug
-
Resolution: Unresolved
-
Major
-
4.20
-
Quality / Stability / Reliability
-
False
-
-
5
-
Moderate
-
None
-
None
-
Rejected
-
OCP Node Sprint 273 (Green), OCP Node Sprint 274 (green), OCP Node Sprint 275 (green), OCP Node Sprint 276 (green), OCP Node Sprint 277 (green)
-
5
-
In Progress
-
Release Note Not Required
-
None
-
None
-
None
-
None
-
None
Description of problem:
microshift CI contains a fully offline test VM (not network interfaces, container images baked into bootc image). After bumping crio to 1.33 one of such tests started failing: one of the Pods (always the same, openvino-resnet-predictor in test-ai namespace), does not start after host reboot. Describing the Pod shows following events, however first two I think are before the reboot (because of shutdown.target and systemd-reboot.service are queued): Warning FailedCreatePodContainer 4m21s (x2 over 4m22s) kubelet unable to ensure pod container exists: failed to create container for [kubepods burstable pod87f08872-82f7-4e59-9a4d-2842dd1926bd] : unable to start unit "kubepods-burstable-pod87f08872_82f7_4e59_9a4d_2842dd1926bd.slice" (properties [{Name:Description Value:"libcontainer container kubepods-burstable-pod87f08872_82f7_4e59_9a4d_2842dd1926bd.slice"} {Name:Wants Value:["kubepods-burstable.slice"]} {Name:MemoryAccounting Value:true} {Name:CPUAccounting Value:true} {Name:IOAccounting Value:true} {Name:TasksAccounting Value:true} {Name:DefaultDependencies Value:false}]): Transaction for kubepods-burstable-pod87f08872_82f7_4e59_9a4d_2842dd1926bd.slice/start is destructive (shutdown.target has \'start\' job queued, but \'stop\' is included in transaction). Warning FailedCreatePodContainer 4m10s kubelet unable to ensure pod container exists: failed to create container for [kubepods burstable pod87f08872-82f7-4e59-9a4d-2842dd1926bd] : unable to start unit "kubepods-burstable-pod87f08872_82f7_4e59_9a4d_2842dd1926bd.slice" (properties [{Name:Description Value:"libcontainer container kubepods-burstable-pod87f08872_82f7_4e59_9a4d_2842dd1926bd.slice"} {Name:Wants Value:["kubepods-burstable.slice"]} {Name:MemoryAccounting Value:true} {Name:CPUAccounting Value:true} {Name:IOAccounting Value:true} {Name:TasksAccounting Value:true} {Name:DefaultDependencies Value:false}]): Transaction for kubepods-burstable-pod87f08872_82f7_4e59_9a4d_2842dd1926bd.slice/start is destructive (systemd-reboot.service has \'start\' job queued, but \'stop\' is included in transaction). Warning NetworkNotReady 3m6s (x5 over 3m12s) kubelet network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/cni/net.d/. Has your network provider started? Normal SandboxChanged 2s (x16 over 3m4s) kubelet Pod sandbox changed, it will be killed and re-created.' ---- crio log says: Jun 26 10:59:46 el96-src-ai-model-serving-offline-bootc-host1 crio[1271]: time="2025-06-26T10:59:46.807917721Z" level=debug msg="Response error: failed to destroy network for pod sandbox k8s_openvino-resnet-predictor-59949b6d9c-p724p_test-ai_87f08872-82f7-4e59-9a4d-2842dd1926bd_0(cc8bf52d0066c7b5efaf430f09dc8a179a7249817b8483480f0643279ee248be): error removing pod test-ai_openvino-resnet-predictor-59949b6d9c-p724p from CNI network \"ovn-kubernetes\": plugin type=\"ovn-k8s-cni-overlay\" name=\"ovn-kubernetes\" failed (delete): CNI request failed with status 400: '[test-ai/openvino-resnet-predictor-59949b6d9c-p724p cc8bf52d0066c7b5efaf430f09dc8a179a7249817b8483480f0643279ee248be network default NAD default] [test-ai/openvino-resnet-predictor-59949b6d9c-p724p cc8bf52d0066c7b5efaf430f09dc8a179a7249817b8483480f0643279ee248be network default NAD default] failed to get container namespace for pod test-ai/openvino-resnet-predictor-59949b6d9c-p724p NAD default: failed to Statfs \"\": no such file or directory\n': stat netns path \"\": stat : no such file or directory" file="interceptors/interceptors.go:73" id=92c09e2f-03a1-4f0a-82cd-c8603930df9f name=/runtime.v1.RuntimeService/StopPodSandbox
Version-Release number of selected component (if applicable):
crio 1.33.1
How reproducible:
Always
Steps to Reproduce:
1. Fully offline MicroShift VM for AI Model Serving testing 2. Create an InferenceService which creates Deployment and a Pod 3. Reboot the host
Actual results:
Pod that ran okay before reboot, does not start
Expected results:
Pod starts normally
Additional info:
Journal with normal log level crio: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-microshift-release-4.20-periodics-e2e-aws-tests-bootc-nightly/1939504337394864128/artifacts/e2e-aws-tests-bootc-nightly/openshift-microshift-e2e-metal-tests/artifacts/scenario-info/el96-src@ai-model-serving-offline/vms/host1/sos/journal_2025-06-30_03:38:43.log SOS report that includes crio with log level debug: https://drive.google.com/file/d/1dgFIjpSh0-Q_kMWG99FYG-_yqNcaT7JB/view?usp=sharing
- blocks
-
USHIFT-5864 Offline test ai-model-serving fails after upgrading crio to 1.33
-
- Closed
-
- links to