Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: 4.14.0
Affects Version/s: 4.14
Component/s: HyperShift
Labels:

Regression:
No
Release Blocker:
Proposed
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.14.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:


On August 24th, a bugfix was merged into the hypershift repo to address OCPBUGS-16813 (https://github.com/openshift/hypershift/pull/2942). This resulted in a change in the konnectivity server with the HCP namespace. The change is that we went from a single konnectivity server to multiple when HA hcps are in use.

The konnectivity agents within the HCP worker nodes connect to the server through a route. When connecting through this route, the agents on the worker are supposed to discover all the HA konnectivity servers through round robin load balancing, meaning if the agents try to connect to the route endpoint enough times, the theory is that they should eventually discover all the servers.

With the kubevirt platform, only a single konnectivity server is discovered by the agents in the worker nodes, which leads to the inability for the kas on the HCP to reliably contact kubelets within the worker nodes.

The outcome of this issue is that webhooks (and other connections that require the kas (api server) in the HCP to contact worker nodes) to fail the majority of the time.

Version-Release number of selected component (if applicable):

How reproducible:


create a kubevirt platform HCP using the `hcp` cli tool. This will default to HA mode, and the cluster will never fully roll out. The ingress, monitoring, and console clusteroperators will flap back and forth between failing and success. Usually we'll see an error about webhook connectivity failing.

During this time, any `oc` command that attempts to tunnel a connection through the kas to the kubelets will fail the majority of the time. This means `oc logs`, `oc exec`, etc... will not work. 


Actual results:{code:none}

kas -> kubelet connections are unreliable

Expected results:


kas -> kubelet connections are reliable

Additional info:

links to

openshift/hypershift#2971: OCPBUGS-18336: make konnectivity routes roundrobin

RHSA-2023:5006 OpenShift Container Platform 4.14.z security update

mentioned on

Merge request - Bump IBM integration to our latest prod image.

Assignee:: Seth Jennings

Reporter:: David Vossel

QA Contact:: Liangquan Li

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Created:: 2023/08/30 12:37 PM

Updated:: 2024/04/29 5:08 PM

Resolved:: 2023/10/31 1:42 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates