Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: 4.15.0
Affects Version/s: 4.14.z, 4.15.0
Component/s: HyperShift
Labels:

Severity:
Moderate
Regression:
No
Sprint:
Hypershift Sprint 243
sprint_count:
1
Release Blocker:
Proposed
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Type:
Release Note Not Required
Release Note Status:
In Progress
Target Version:

4.15.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

The HyperShift Operator does not guarantee that two request serving nodes will be labeled with the HCP's namespace-name. It is likely that it labels the nodes initially and then doesn't notice if the nodes get deleted by something else.

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. Create a HCP with dedicated request serving nodes
2. Delete one of the request serving nodes (via deleting the node directly or its machine)
3. Observe that the replacement node does not have the required label for scheduling its request-serving pods

Actual results:

HCP's can exist without two nodes labeled with the HCP's name, causing the kube-apiserver pods to be unschedulable

❯ k get no -lhypershift.openshift.io/cluster=ocm-staging-26ljge23ub1112ve884u0opvkj2c4lpc-perf-rhcp-0012
NAME                                        STATUS   ROLES    AGE   VERSION
ip-10-0-34-188.us-east-2.compute.internal   Ready    worker   9h    v1.27.6+1648878

❯ k get po -n ocm-staging-26ljge23ub1112ve884u0opvkj2c4lpc-perf-rhcp-0012 -lapp=kube-apiserver -owide   
NAME                             READY   STATUS    RESTARTS   AGE    IP             NODE                                        NOMINATED NODE   READINESS GATES
kube-apiserver-54854bcb7-v88dq   0/5     Pending   0          151m   <none>         <none>                                      <none>           <none>
kube-apiserver-54854bcb7-x5jqt   5/5     Running   0          3h2m   10.128.236.6   ip-10-0-34-188.us-east-2.compute.internal   <none>           <none>

Expected results:

Every HCP has two nodes labeled with the HCP's name

❯ k get po -n ocm-staging-26ljip0ck3d2i1bejp2sipio4okhgttn-perf-rhcp-0017 -l app=kube-apiserver -owide
NAME                            READY   STATUS    RESTARTS   AGE    IP             NODE                                        NOMINATED NODE   READINESS GATES
kube-apiserver-5f85cd4b-l57qr   5/5     Running   0          169m   10.128.218.6   ip-10-0-114-35.us-east-2.compute.internal   <none>           <none>
kube-apiserver-5f85cd4b-lqfsx   5/5     Running   0          169m   10.128.129.6   ip-10-0-59-232.us-east-2.compute.internal   <none>           <none>

❯ k get no -lhypershift.openshift.io/cluster=ocm-staging-26ljip0ck3d2i1bejp2sipio4okhgttn-perf-rhcp-0017
NAME                                        STATUS   ROLES    AGE    VERSION
ip-10-0-114-35.us-east-2.compute.internal   Ready    worker   24h    v1.27.6+1648878
ip-10-0-59-232.us-east-2.compute.internal   Ready    worker   5d2h   v1.27.6+1648878

Additional info:

links to

openshift/hypershift#3077: OCPBUGS-20105: OCPBUGS-20109: Update the scheduler to only accept paired Nodes and check scheduler HCs has two Nodes

RHEA-2023:7198 rpm

mentioned on

Merge request - Bump IBM integration to our latest prod image.

Assignee:: Alberto Garcia Lamela

Reporter:: Michael Shen (Inactive)

QA Contact:: Jie Zhao

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Created:: 2023/10/04 6:52 PM

Updated:: 2024/04/29 5:03 PM

Resolved:: 2024/02/27 8:51 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates