Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: 4.14.0
Affects Version/s: 4.14.0
Component/s: Cloud Compute / KubeVirt Provider
Labels:

Severity:
Critical
Regression:
No
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.14.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

The HyperShift KubeVirt (openshift virtualization) platform has worker nodes that are hosted by KubeVirt virtual  machines. The worker node's internal IP address is interpreted by inspecting the kubevirt vmi's vmi.status.interface field.

Due to the way the vmi.status.interface field sources its information from the qemu guest agent, that field is not guaranteed to remain static in some scenarios, such as soft reboot or when the qemu agent is temporarily unavailable. During these situations, the interfaces list will be empty.

When the interfaces list is empty on the vmi, there are Hypershift related components (cloud-provider-kubevirt and cluster-api-provider-kubevirt) which strip the worker nodes internal IP. This stripping of the node's internal IP causes unpredictable behavior that results in connectivity failures from the KAS to the worker node kubelets.

To address this, the Hypershift related kubevirt components need to only update the Internal IP of worker nodes when the vmi.status.interfaces list has an IP for the default interface. Othewise these hypershift components should use the last known internal IP address rather than stripping the internal IP address from the node.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100% given enough time and the right environment.

Steps to Reproduce:

1. create a hypershift kubevirt guest cluster
2. run the csi conformance test suite in a loop (this test suite causes the vmi.status.interfaces list to become unstable briefly at times)

Actual results:

the csi test suite will eventually begin failing due to inabiilty to pod exec into worker node pods. This is caused by the node's internal IP being removed.

Expected results:

csi conformance should pass reliably

Additional info:

depends on

OCPBUGS-19393 Unstable node internal IP causes connection errors for KubeVirt platform

Closed

is cloned by

OCPBUGS-19393 Unstable node internal IP causes connection errors for KubeVirt platform

Closed

links to

openshift/cloud-provider-kubevirt#26: [release-4.14] OCPBUGS-19020: Auto sync upstream 2023 09 15 20 36

RHSA-2023:5006 OpenShift Container Platform 4.14.z security update

Assignee:: David Vossel

Reporter:: David Vossel

QA Contact:: Yu Li

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2023/09/14 4:38 PM

Updated:: 2023/10/31 1:43 PM

Resolved:: 2023/10/31 1:43 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates