Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-25812

[OCP 4.15] VM stuck in terminating state after OCP node crash

    • Important
    • No
    • x86_64
    • False
    • Hide

      None

      Show
      None
    • Customer Escalated

      Description of problem:

      After a manual crash of a OCP node the OSPD VM running on the OCP node is stuck in terminating state

      Version-Release number of selected component (if applicable):

      OCP 4.12.15 
      osp-director-operator.v1.3.0
      kubevirt-hyperconverged-operator.v4.12.5

      How reproducible:

      Login to a OCP 4.12.15 Node running a VM 
      Manually crash the master node.
      After reboot the VM stay in terminating state
      

      Steps to Reproduce:

          1. ssh core@masterX 
          2. sudo su
          3. echo c > /proc/sysrq-trigger     

      Actual results:

      After reboot the VM stay in terminating state
      
      
      $ omc get node|sed -e 's/modl4osp03ctl/model/g' | sed -e 's/telecom.tcnz.net/aaa.bbb.ccc/g'
      NAME                               STATUS   ROLES                         AGE   VERSION
      model01.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08
      model02.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08
      model03.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08
      
      
      $ omc get pod -n openstack 
      NAME                                                        READY   STATUS         RESTARTS   AGE
      openstack-provision-server-7b79fcc4bd-x8kkz                 2/2     Running        0          8h
      openstackclient                                             1/1     Running        0          7h
      osp-director-operator-controller-manager-5896b5766b-sc7vm   2/2     Running        0          8h
      osp-director-operator-index-qxxvw                           1/1     Running        0          8h
      virt-launcher-controller-0-9xpj7                            1/1     Running        0          20d
      virt-launcher-controller-1-5hj9x                            1/1     Running        0          20d
      virt-launcher-controller-2-vhd69                            0/1     NodeAffinity   0          43d
      
      $ omc describe  pod virt-launcher-controller-2-vhd69 |grep Status:
      Status:                    Terminating (lasts 37h)
      
      $ xsos sosreport-xxxx/|grep time
      ...
        Boot time: Wed Nov 22 01:44:11 AM UTC 2023
        Uptime:    8:27,  0 users
        

      Expected results:

      VM restart automatically OR does not stay in Terminating state 

      Additional info:

      The issue has been seen two time.
      
      First time, a crash of the kernel occured and we had the associated VM on the node in terminating state
      
      Second time we try to reproduce the issue by crashing manually the kernel and we got the same result.
      The VM running on the OCP node stay in terminating state 

            [OCPBUGS-25812] [OCP 4.15] VM stuck in terminating state after OCP node crash

            Errata Tool added a comment -

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Critical: OpenShift Container Platform 4.15.0 bug fix and security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2023:7198

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Critical: OpenShift Container Platform 4.15.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:7198

            The docs team is preparing the bug text for the 4.15 release notes. Based on the fix and affects version, this bug needs to be included in the release notes. Please update your issue by 2/12.

            Set the Release Note Type to Bug Fix and provide the Release Note Text in the following format:

            Cause: What actions or circumstances cause this bug to present.
            Consequence: What happens when the bug presents.
            Fix: What was done to fix the bug.
            Result: Bug doesn’t present anymore.

            If your bug was actually found and fixed in 4.15 or should be internal only, set the Release Note Type to Release Note Not Required.

            Kathryn Alexander added a comment - The docs team is preparing the bug text for the 4.15 release notes. Based on the fix and affects version, this bug needs to be included in the release notes. Please update your issue by 2/12. Set the Release Note Type to Bug Fix and provide the Release Note Text in the following format: Cause : What actions or circumstances cause this bug to present. Consequence : What happens when the bug presents. Fix : What was done to fix the bug. Result : Bug doesn’t present anymore. If your bug was actually found and fixed in 4.15 or should be internal only, set the Release Note Type to Release Note Not Required .

            Hi hekumar@redhat.com,

            Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            OpenShift Jira Bot added a comment - Hi hekumar@redhat.com , Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            Looks like this bug is far enough along in the workflow that a code fix is ready. Customers and support need to know the backport plan. Please complete the "Target Backport Versions" field to indicate which version(s) will receive the fix.

            OpenShift Jira Bot added a comment - Looks like this bug is far enough along in the workflow that a code fix is ready. Customers and support need to know the backport plan. Please complete the " Target Backport Versions " field to indicate which version(s) will receive the fix.

              hekumar@redhat.com Hemant Kumar
              rhn-support-jpeyrard Johann Peyrard
              Wei Duan Wei Duan
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: