Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: 4.13.z
Affects Version/s: 4.12.z
Component/s: Storage / Kubernetes
Labels:
- FastFix
- csi
- finalizer
- kubevirt

Severity:
Important
Regression:
No
Architecture:

x86_64
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Customer Impact:

Customer Escalated
RH Private Keywords:
Target Version:

4.13.z

SFDC Cases Counter:
SFDC Cases Links:

PX Impact Score:
PX Scheduling Request:
PX Priority Data:
PX Impact Range:
PX Review Complete:
PX Technical Impact:

Description of problem:

After a manual crash of a OCP node the OSPD VM running on the OCP node is stuck in terminating state

Version-Release number of selected component (if applicable):

OCP 4.12.15 
osp-director-operator.v1.3.0
kubevirt-hyperconverged-operator.v4.12.5

How reproducible:

Login to a OCP 4.12.15 Node running a VM 
Manually crash the master node.
After reboot the VM stay in terminating state

Steps to Reproduce:

    1. ssh core@masterX 
    2. sudo su
    3. echo c > /proc/sysrq-trigger

Actual results:

After reboot the VM stay in terminating state


$ omc get node|sed -e 's/modl4osp03ctl/model/g' | sed -e 's/telecom.tcnz.net/aaa.bbb.ccc/g'
NAME                               STATUS   ROLES                         AGE   VERSION
model01.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08
model02.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08
model03.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08


$ omc get pod -n openstack 
NAME                                                        READY   STATUS         RESTARTS   AGE
openstack-provision-server-7b79fcc4bd-x8kkz                 2/2     Running        0          8h
openstackclient                                             1/1     Running        0          7h
osp-director-operator-controller-manager-5896b5766b-sc7vm   2/2     Running        0          8h
osp-director-operator-index-qxxvw                           1/1     Running        0          8h
virt-launcher-controller-0-9xpj7                            1/1     Running        0          20d
virt-launcher-controller-1-5hj9x                            1/1     Running        0          20d
virt-launcher-controller-2-vhd69                            0/1     NodeAffinity   0          43d

$ omc describe  pod virt-launcher-controller-2-vhd69 |grep Status:
Status:                    Terminating (lasts 37h)

$ xsos sosreport-xxxx/|grep time
...
  Boot time: Wed Nov 22 01:44:11 AM UTC 2023
  Uptime:    8:27,  0 users

Expected results:

VM restart automatically OR does not stay in Terminating state

Additional info:

The issue has been seen two time.

First time, a crash of the kernel occured and we had the associated VM on the node in terminating state

Second time we try to reproduce the issue by crashing manually the kernel and we got the same result.
The VM running on the OCP node stay in terminating state

blocks

OCPBUGS-25815 [OCP 4.12] VM stuck in terminating state after OCP node crash

Closed

clones

OCPBUGS-25813 [OCP 4.14] VM stuck in terminating state after OCP node crash

Closed

is blocked by

OCPBUGS-25813 [OCP 4.14] VM stuck in terminating state after OCP node crash

Closed

is cloned by

OCPBUGS-25815 [OCP 4.12] VM stuck in terminating state after OCP node crash

Closed

links to

openshift/kubernetes#1831: OCPBUGS-25814: Fix device uncertain errors on reboot - 4.13

RHBA-2024:0286 OpenShift Container Platform 4.13.z bug fix update

(1 links to)

Assignee:: Hemant Kumar

Reporter:: Johann Peyrard

QA Contact:: Wei Duan

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2023/12/21 3:25 PM

Updated:: 2024/01/24 5:55 AM

Resolved:: 2024/01/24 5:55 AM

Details

Description

Attachments

Issue Links

Activity

People

Dates