Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-25812

[OCP 4.15] VM stuck in terminating state after OCP node crash

    • Important
    • No
    • x86_64
    • False
    • Hide

      None

      Show
      None
    • Customer Escalated

      Description of problem:

      After a manual crash of a OCP node the OSPD VM running on the OCP node is stuck in terminating state

      Version-Release number of selected component (if applicable):

      OCP 4.12.15 
      osp-director-operator.v1.3.0
      kubevirt-hyperconverged-operator.v4.12.5

      How reproducible:

      Login to a OCP 4.12.15 Node running a VM 
      Manually crash the master node.
      After reboot the VM stay in terminating state
      

      Steps to Reproduce:

          1. ssh core@masterX 
          2. sudo su
          3. echo c > /proc/sysrq-trigger     

      Actual results:

      After reboot the VM stay in terminating state
      
      
      $ omc get node|sed -e 's/modl4osp03ctl/model/g' | sed -e 's/telecom.tcnz.net/aaa.bbb.ccc/g'
      NAME                               STATUS   ROLES                         AGE   VERSION
      model01.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08
      model02.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08
      model03.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08
      
      
      $ omc get pod -n openstack 
      NAME                                                        READY   STATUS         RESTARTS   AGE
      openstack-provision-server-7b79fcc4bd-x8kkz                 2/2     Running        0          8h
      openstackclient                                             1/1     Running        0          7h
      osp-director-operator-controller-manager-5896b5766b-sc7vm   2/2     Running        0          8h
      osp-director-operator-index-qxxvw                           1/1     Running        0          8h
      virt-launcher-controller-0-9xpj7                            1/1     Running        0          20d
      virt-launcher-controller-1-5hj9x                            1/1     Running        0          20d
      virt-launcher-controller-2-vhd69                            0/1     NodeAffinity   0          43d
      
      $ omc describe  pod virt-launcher-controller-2-vhd69 |grep Status:
      Status:                    Terminating (lasts 37h)
      
      $ xsos sosreport-xxxx/|grep time
      ...
        Boot time: Wed Nov 22 01:44:11 AM UTC 2023
        Uptime:    8:27,  0 users
        

      Expected results:

      VM restart automatically OR does not stay in Terminating state 

      Additional info:

      The issue has been seen two time.
      
      First time, a crash of the kernel occured and we had the associated VM on the node in terminating state
      
      Second time we try to reproduce the issue by crashing manually the kernel and we got the same result.
      The VM running on the OCP node stay in terminating state 

            [OCPBUGS-25812] [OCP 4.15] VM stuck in terminating state after OCP node crash

            Hemant Kumar created issue -
            Hemant Kumar made changes -
            Link New: This issue clones OCPBUGS-23896 [ OCPBUGS-23896 ]
            OpenShift Jira Bot made changes -
            Assignee Original: Hemant Kumar [ hekumar@redhat.com ]
            Hemant Kumar made changes -
            Target Version Original: 4.16.0 [ 12417855 ] New: 4.15.0 [ 12407353 ]
            OpenShift Prow Bot made changes -
            Remote Link New: This issue links to "openshift/kubernetes#1832: OCPBUGS-25812: Fix device uncertain errors on reboot - 4.15 (Web Link)" [ 1510834 ]
            Hemant Kumar made changes -
            Link New: This issue is blocked by OCPBUGS-23896 [ OCPBUGS-23896 ]
            Hemant Kumar made changes -
            Labels Original: csi finalizer kubevirt New: FastFix csi finalizer kubevirt
            Hemant Kumar made changes -
            Link New: This issue is cloned by OCPBUGS-25813 [ OCPBUGS-25813 ]
            Hemant Kumar made changes -
            Link New: This issue blocks OCPBUGS-25813 [ OCPBUGS-25813 ]
            Fabio Bertinatto made changes -
            Assignee New: Hemant Kumar [ hekumar@redhat.com ]
            Fabio Bertinatto made changes -
            Status Original: New [ 10016 ] New: ASSIGNED [ 14452 ]
            OpenShift Prow Bot made changes -
            Status Original: ASSIGNED [ 14452 ] New: POST [ 15726 ]
            OpenShift Prow Bot made changes -
            Status Original: POST [ 15726 ] New: MODIFIED [ 14454 ]
            ART Bot made changes -
            Status Original: MODIFIED [ 14454 ] New: ON_QA [ 15723 ]
            Wei Duan made changes -
            Status Original: ON_QA [ 15723 ] New: Verified [ 10015 ]
            OpenShift Release-Controller Bot made changes -
            Fix Version/s New: 4.15.0 [ 12407353 ]
            Errata Tool made changes -
            Remote Link New: This issue links to "RHSA-2023:7198 (Web Link)" [ 1533210 ]
            Errata Tool made changes -
            Resolution New: Done-Errata [ 10803 ]
            Status Original: Verified [ 10015 ] New: Closed [ 6 ]

              hekumar@redhat.com Hemant Kumar
              rhn-support-jpeyrard Johann Peyrard
              Wei Duan Wei Duan
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: