Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.20
Component/s: Cloud Compute / Machine API Providers
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Scaling up 400 worker nodes on gcp, after all machines are running the cluster will be unreachable, master n2-standard-4, worker n2-standard-2

Version-Release number of selected component (if applicable):

Always

How reproducible:

always

Steps to Reproduce:

1. Set up a cluster on gcp master n2-standard-4, worker n2-standard-2 
2. Scale up machineset to 400
3. After all machines are running, run oc commands.

Actual results:

oc commands always get "Unable to connect to the server: EOF"

$ oc scale machineset zhsungcp-t9nql-worker-b --replicas 200     [15:39:41] 
oc scale machineset zhsungcp-t9nql-worker-c --replicas 100
oc scale machineset zhsungcp-t9nql-worker-d --replicas 100
machineset.machine.openshift.io/zhsungcp-t9nql-worker-b scaled
machineset.machine.openshift.io/zhsungcp-t9nql-worker-c scaled
machineset.machine.openshift.io/zhsungcp-t9nql-worker-d scaled

$ oc get machine | grep Provisioning | wc                [15:59:36]
oc get machine | grep Provisioned | wc -l
oc get machine | grep Running | wc -l
      18
      59
     332

$ oc get machine | grep Provisioning | wc -l             [16:06:43]
oc get machine | grep Provisioned | wc -l
oc get machine | grep Running | wc -l
       0
       0
     403
$ oc get csr | grep Pending | wc -l                   [16:08:14]
       0

mapi_machine_phase_transition_seconds_sum{phase="Running"} 393510.21958385484
mapi_machine_phase_transition_seconds_count{phase="Running"} 403

 $ oc get co                              [16:32:18]
Unable to connect to the server: EOF

Expected results:

The cluster is reachable

Additional info:

Assignee:: Theo Barber-Bany

Reporter:: Zhaohua Sun

Need Info From:: None

Contributors:: None

QA Contact:: Zhaohua Sun

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/06/12 6:57 AM

Updated:: 2025/07/12 1:16 PM

Resolved:: 2025/06/13 8:41 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates