Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: CNV v4.18.8
Component/s: Storage Platform
Labels:
- chaos

Activity Type:
Quality / Stability / Reliability
Story Points:
0.42
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Component Fix Version(s):
None
Market:

Severity:
Critical

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:

Description of problem:

Outages of 2  worker nodes  hosting MON pods(2 out of 3 )in different ODF zones causes some of the  windows VMs to crash and transition to recovery mode post chaos after the network latency has been removed. Worker node outage was created by introducing  60 seconds of network latency.

Version-Release number of selected component (if applicable):

openshift-cnv                                      kubevirt-hyperconverged-operator.v4.18.8          OpenShift Virtualization           4.18.8                kubevirt-hyperconverged-operator.v4.18.3          Succeeded
openshift-ovirt-infra                              node-healthcheck-operator.v0.9.0                  Node Health Check Operator         0.9.0                 node-healthcheck-operator.v0.8.2                  Succeeded

How reproducible:

1. label the worker nodes with MON pods with label chaos=odf 
2. Introduced 60s network latency for 15mins

Steps to Reproduce:

1. label the worker nodes with MON pods with label chaos=odf

2.Run chaos command, to cause 60s network latency for 15mins

podman run --rm -e LABEL_SELECTOR="chaos=odf" -e INSTANCE_COUNT=2  -e DURATION=900 -e TRAFFIC_TYPE=egress  -e  EGRESS='{latency: 60000ms}' -e KUBECONFIG=/tmp/config  -e KRKN_KUBE_CONFIG=/tmp/config -e DISTRIBUTION='openshift' --net=host -v /tmp/config:/tmp/config:Z  quay.io/krkn-chaos/krkn-hub:network-chaos 3.

Actual results:

17 out of 255 Windows VM were not accessible through ssh.Of the 17, some of them had blank screen and most were at windows recovery screen post chaos after the MON worker nodes have recovered.

Expected results:

All the Windows VMs should be accessible through SSH post chaos.

Additional info:

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

Screenshot from 2025-08-01 23-09-09.png
161 kB
2025/08/01 5:42 PM
Screenshot from 2025-08-01 23-22-44.png
62 kB
2025/08/01 6:57 PM
Screenshot from 2025-08-01 23-33-53.png
54 kB
2025/08/01 6:58 PM
windows-vm-20b6146d-22
40 kB
2025/08/01 5:22 PM
windows-vm-20b6146d-41
41 kB
2025/08/01 5:22 PM

is cloned by

RHEL-108735 Windows BSOD with nbd read and write delay

Release Pending

split to

RHEL-107617 2 worker node outage with MON pods causes windows VMs to crash post chaos

Closed

Assignee:: Adam Litke

Reporter:: Yogananth Subramanian

QA Contact:: Natalie Gavrielov

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2025/07/16 10:35 AM

Updated:: 2025/08/12 2:41 AM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates