Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: 4.17.0
Affects Version/s: 4.15.0
Component/s: HyperShift
Labels:

Severity:
Critical
Regression:
No
Epic Link:
Support 500 worker nodes in ROSA
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.17.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

    On Running PerfScale test on staging sectors, the script creates 1 HC per minute to load up a Management Cluster to its maximum capacity(64 HC). There were 2 clusters trying to use same serving node pair and got in to a deadlock

# oc get nodes -l osd-fleet-manager.openshift.io/paired-nodes=serving-12 
NAME                                        STATUS   ROLES    AGE   VERSION
ip-10-0-4-127.us-east-2.compute.internal    Ready    worker   34m   v1.27.11+d8e449a
ip-10-0-84-196.us-east-2.compute.internal   Ready    worker   34m   v1.27.11+d8e449a

Each node got assigned to 2 different cluster
# oc get nodes -l hypershift.openshift.io/cluster=ocm-staging-2bcimf68iudmq2pctkj11os571ahutr1-mukri-dysn-0017 
NAME                                       STATUS   ROLES    AGE   VERSION
ip-10-0-4-127.us-east-2.compute.internal   Ready    worker   33m   v1.27.11+d8e449a

# oc get nodes -l hypershift.openshift.io/cluster=ocm-staging-2bcind28698qgrugl87laqerhhb0u2c2-mukri-dysn-0019
NAME                                        STATUS   ROLES    AGE   VERSION
ip-10-0-84-196.us-east-2.compute.internal   Ready    worker   36m   v1.27.11+d8e449a

Taints were missing on those nodes, so metric-forwarder pod from other hostedclusters got scheduled on serving nodes.

# oc get pods -A -o wide | grep ip-10-0-84-196.us-east-2.compute.internal 
ocm-staging-2bcind28698qgrugl87laqerhhb0u2c2-mukri-dysn-0019   kube-apiserver-86d4866654-brfkb                                           5/5     Running                  0                40m     10.128.48.6      ip-10-0-84-196.us-east-2.compute.internal    <none>           <none>
ocm-staging-2bcins06s2acm59sp85g4qd43g9hq42g-mukri-dysn-0020   metrics-forwarder-6d787d5874-69bv7                                        1/1     Running                  0                40m     10.128.48.7      ip-10-0-84-196.us-east-2.compute.internal    <none>           <none>

and few more

Version-Release number of selected component (if applicable):

MC Version 4.14.17
HC version 4.15.10
HO Version quay.io/acm-d/rhtap-hypershift-operator:c698d1da049c86c2cfb4c0f61ca052a0654e2fb9

How reproducible:

Not Always

Steps to Reproduce:

    1. Create an MC with prod config (non-dynamic serving node)
    2. Create HCs on them at 1 HCP per minutes
    3. Cluster stuck at installing for more than 30 minutes

Actual results:

Only one replica of Kube-apiserver pods were up and the second stuck at pending state, upon checking the machine API has scaled both nodes in that machineset(serving-12) but only one got assigned(labelled). Further checking that node from one zone(serving-12a) was assigned to a specific hosted cluster(0017), and the other one(serving-12b) was assigned to a different hosted cluster(0019)

Expected results:

Kube-apiserver replica should be on the same machinesets and those node should be tainted.

Additional info: Slack

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

ho.log
23.92 MB
2024/05/23 3:47 PM
kas_unknown.yaml
20 kB
2024/05/23 3:46 PM

relates to

HOSTEDCP-1695 HyperShift 0.1.35

Closed

links to

openshift/hypershift#4103: OCPBUGS-33987: Use configmaps to track hosted cluster pair labels and set controller concurrency

RHEA-2024:3718 OpenShift Container Platform 4.17.z bug fix update

Assignee:: Alberto Garcia Lamela

Reporter:: Murali Krishnasamy

QA Contact:: Jie Zhao

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/05/20 5:48 PM

Updated:: 2024/10/01 5:31 PM

Resolved:: 2024/10/01 5:31 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates