Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-58229

MicroShift: Pod in offline scenario does not start after reboot after bumping CRIO to 1.33.1

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • 4.20
    • 4.20
    • Node / CRI-O
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 5
    • Moderate
    • None
    • None
    • Rejected
    • OCP Node Sprint 273 (Green), OCP Node Sprint 274 (green), OCP Node Sprint 275 (green), OCP Node Sprint 276 (green), OCP Node Sprint 277 (green)
    • 5
    • In Progress
    • Release Note Not Required
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      microshift CI contains a fully offline test VM (not network interfaces, container images baked into bootc image).
      After bumping crio to 1.33 one of such tests started failing: one of the Pods (always the same, openvino-resnet-predictor in test-ai namespace), does not start after host reboot.
      
      Describing the Pod shows following events, however first two I think are before the reboot (because of shutdown.target and systemd-reboot.service are queued):
      
        Warning  FailedCreatePodContainer  4m21s (x2 over 4m22s)  kubelet            unable to ensure pod container exists: failed to create container for [kubepods burstable pod87f08872-82f7-4e59-9a4d-2842dd1926bd] : unable to start unit "kubepods-burstable-pod87f08872_82f7_4e59_9a4d_2842dd1926bd.slice" (properties [{Name:Description Value:"libcontainer container kubepods-burstable-pod87f08872_82f7_4e59_9a4d_2842dd1926bd.slice"} {Name:Wants Value:["kubepods-burstable.slice"]} {Name:MemoryAccounting Value:true} {Name:CPUAccounting Value:true} {Name:IOAccounting Value:true} {Name:TasksAccounting Value:true} {Name:DefaultDependencies Value:false}]): Transaction for kubepods-burstable-pod87f08872_82f7_4e59_9a4d_2842dd1926bd.slice/start is destructive (shutdown.target has \'start\' job queued, but \'stop\' is included in transaction).
        Warning  FailedCreatePodContainer  4m10s                  kubelet            unable to ensure pod container exists: failed to create container for [kubepods burstable pod87f08872-82f7-4e59-9a4d-2842dd1926bd] : unable to start unit "kubepods-burstable-pod87f08872_82f7_4e59_9a4d_2842dd1926bd.slice" (properties [{Name:Description Value:"libcontainer container kubepods-burstable-pod87f08872_82f7_4e59_9a4d_2842dd1926bd.slice"} {Name:Wants Value:["kubepods-burstable.slice"]} {Name:MemoryAccounting Value:true} {Name:CPUAccounting Value:true} {Name:IOAccounting Value:true} {Name:TasksAccounting Value:true} {Name:DefaultDependencies Value:false}]): Transaction for kubepods-burstable-pod87f08872_82f7_4e59_9a4d_2842dd1926bd.slice/start is destructive (systemd-reboot.service has \'start\' job queued, but \'stop\' is included in transaction).
        Warning  NetworkNotReady           3m6s (x5 over 3m12s)   kubelet            network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/cni/net.d/. Has your network provider started?
        Normal   SandboxChanged            2s (x16 over 3m4s)     kubelet            Pod sandbox changed, it will be killed and re-created.'
      ----
      
      crio log says:
      
      Jun 26 10:59:46 el96-src-ai-model-serving-offline-bootc-host1 crio[1271]: time="2025-06-26T10:59:46.807917721Z" level=debug msg="Response error: failed to destroy network for pod sandbox k8s_openvino-resnet-predictor-59949b6d9c-p724p_test-ai_87f08872-82f7-4e59-9a4d-2842dd1926bd_0(cc8bf52d0066c7b5efaf430f09dc8a179a7249817b8483480f0643279ee248be): error removing pod test-ai_openvino-resnet-predictor-59949b6d9c-p724p from CNI network \"ovn-kubernetes\": plugin type=\"ovn-k8s-cni-overlay\" name=\"ovn-kubernetes\" failed (delete): CNI request failed with status 400: '[test-ai/openvino-resnet-predictor-59949b6d9c-p724p cc8bf52d0066c7b5efaf430f09dc8a179a7249817b8483480f0643279ee248be network default NAD default] [test-ai/openvino-resnet-predictor-59949b6d9c-p724p cc8bf52d0066c7b5efaf430f09dc8a179a7249817b8483480f0643279ee248be network default NAD default] failed to get container namespace for pod test-ai/openvino-resnet-predictor-59949b6d9c-p724p NAD default: failed to Statfs \"\": no such file or directory\n': stat netns path \"\": stat : no such file or directory" file="interceptors/interceptors.go:73" id=92c09e2f-03a1-4f0a-82cd-c8603930df9f name=/runtime.v1.RuntimeService/StopPodSandbox

      Version-Release number of selected component (if applicable):

          crio 1.33.1

      How reproducible:

          Always

      Steps to Reproduce:

          1. Fully offline MicroShift VM for AI Model Serving testing
          2. Create an InferenceService which creates Deployment and a Pod
          3. Reboot the host    

      Actual results:

          Pod that ran okay before reboot, does not start

      Expected results:

          Pod starts normally

      Additional info:

      Journal with normal log level crio: 
      
      https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-microshift-release-4.20-periodics-e2e-aws-tests-bootc-nightly/1939504337394864128/artifacts/e2e-aws-tests-bootc-nightly/openshift-microshift-e2e-metal-tests/artifacts/scenario-info/el96-src@ai-model-serving-offline/vms/host1/sos/journal_2025-06-30_03:38:43.log
      
      
      SOS report that includes crio with log level debug:
      
      https://drive.google.com/file/d/1dgFIjpSh0-Q_kMWG99FYG-_yqNcaT7JB/view?usp=sharing    

              skunkerk Sohan Kunkerkar
              pmatusza@redhat.com Patryk Matuszak
              None
              None
              Aditi Sahay Aditi Sahay
              None
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: