Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-23896

[OCP 4.16] VM stuck in terminating state after OCP node crash

    XMLWordPrintable

Details

    • Important
    • No
    • x86_64
    • False
    • Hide

      None

      Show
      None
    • Customer Escalated

    Description

      Description of problem:

      After a manual crash of a OCP node the OSPD VM running on the OCP node is stuck in terminating state

      Version-Release number of selected component (if applicable):

      OCP 4.12.15 
      osp-director-operator.v1.3.0
      kubevirt-hyperconverged-operator.v4.12.5

      How reproducible:

      Login to a OCP 4.12.15 Node running a VM 
      Manually crash the master node.
      After reboot the VM stay in terminating state
      

      Steps to Reproduce:

          1. ssh core@masterX 
          2. sudo su
          3. echo c > /proc/sysrq-trigger     

      Actual results:

      After reboot the VM stay in terminating state
      
      
      $ omc get node|sed -e 's/modl4osp03ctl/model/g' | sed -e 's/telecom.tcnz.net/aaa.bbb.ccc/g'
      NAME                               STATUS   ROLES                         AGE   VERSION
      model01.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08
      model02.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08
      model03.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08
      
      
      $ omc get pod -n openstack 
      NAME                                                        READY   STATUS         RESTARTS   AGE
      openstack-provision-server-7b79fcc4bd-x8kkz                 2/2     Running        0          8h
      openstackclient                                             1/1     Running        0          7h
      osp-director-operator-controller-manager-5896b5766b-sc7vm   2/2     Running        0          8h
      osp-director-operator-index-qxxvw                           1/1     Running        0          8h
      virt-launcher-controller-0-9xpj7                            1/1     Running        0          20d
      virt-launcher-controller-1-5hj9x                            1/1     Running        0          20d
      virt-launcher-controller-2-vhd69                            0/1     NodeAffinity   0          43d
      
      $ omc describe  pod virt-launcher-controller-2-vhd69 |grep Status:
      Status:                    Terminating (lasts 37h)
      
      $ xsos sosreport-xxxx/|grep time
      ...
        Boot time: Wed Nov 22 01:44:11 AM UTC 2023
        Uptime:    8:27,  0 users
        

      Expected results:

      VM restart automatically OR does not stay in Terminating state 

      Additional info:

      The issue has been seen two time.
      
      First time, a crash of the kernel occured and we had the associated VM on the node in terminating state
      
      Second time we try to reproduce the issue by crashing manually the kernel and we got the same result.
      The VM running on the OCP node stay in terminating state 

      Attachments

        Issue Links

          Activity

            People

              hekumar@redhat.com Hemant Kumar
              rhn-support-jpeyrard Johann Peyrard
              Wei Duan Wei Duan
              Votes:
              1 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

                Created:
                Updated: