Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: ROSA
Labels:
None

Target Version:
None
Activity Type:
Product / Portfolio Work
Status Summary:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Products:
None
Hierarchy Progress Bar:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
None
PX Impact Score:
PX Impact Range:
None
PX Priority Data:
None
PX Technical Impact:
None
PX Technical Impact Notes:
None
PX Scheduling Request:
None

Issue:

In ROSA HCP clusters configured with: - Minimum worker
nodes = 2 - Cluster Autoscaler enabled - Default OpenShift managed
components configuration

We observe transient node scale-out events triggered during
RollingUpdate of OpenShift metrics-server components. (Even the exist 2 nodes have enough cpu, memory etc resouce)

Detailed Technical Analysis:

Found
Deployment: metrics-server Strategy: RollingUpdate maxUnavailable: 1
maxSurge: 25% Container args: -shutdown-delay-duration=150s Replicas: 2

Which might cause new pods in pending status while old pods are in terminating status.

$ oc describe deployment/metrics-server -n openshift-monitoring
Name:                   metrics-server

。。。。
Replicas:               2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  1 max unavailable, 25% max surge
。。。。。
  Containers:
      --shutdown-delay-duration=150s

........
$ oc get pods/metrics-server-xxxx -n openshift-monitoring -o yaml
apiVersion: v1
kind: Pod
.........
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app.kubernetes.io/component: metrics-server
            app.kubernetes.io/name: metrics-server
            app.kubernetes.io/part-of: openshift-monitoring
        namespaces:
        - openshift-monitoring
        topologyKey: kubernetes.io/hostname

Sequence during update:
1. metrics-server old pod enters graceful termination (150s shutdown delay)
2. metrics-server new pod is created due to maxSurge
3. For a period of time:
a) Pod old is Terminating but still holding resources
b) Pod new is Pending
4. Scheduler may determine insufficient allocatable capacity
5. machine Autoscaler sees unschedulable pod
6. Autoscaler scales out
7. After Pod A exits, scale-down eventually occurs soon

I0202 08:05:57.430673       1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-monitoring", Name:"metrics-server-xxxxx", UID:"xxxxx", APIVersion:"v1", ResourceVersion:"xxxxx", FieldPath:""}): type: 'Normal' reason: 'TriggeredScaleUp' pod triggered scale-up: [{MachineDeployment/ocm-production-xxxxx-xxxxx/xxxxx-workers-1 1->2 (max: 2)}]


I0202 08:22:39.484581       1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-xxxxxx.ap-northeast-1.compute.internal", UID:"xxxxxx", APIVersion:"v1", ResourceVersion:"xxxxx", FieldPath:""}): type: 'Normal' reason: 'ScaleDown' marked the node as toBeDeleted/unschedulable

Key Point:

Autoscaler decision is made while the terminating pod still
occupies resources but before they are reclaimed. This creates a
temporary scheduling pressure window that does not reflect steady-state
cluster capacity.

Why This Matters:
In minimal-node ROSA clusters (2 nodes): - Platform components run with
tight packing - Graceful termination windows create deterministic
transient capacity pressure - This pressure consistently triggers
scale-out

Result: - Additional AWS instances launched - Short-lived infrastructure
cost - Scale-out events not driven by customer workload

Request: Requesting engineering evaluation on whether metrics-server component
rolling updates in minimal-node clusters can be handled in a more
topology-aware manner to prevent transient scale-out not driven by user
workload.

May be below change can help to avoid this issue, please have a check if it helps.

add cluster-autoscaler.kubernetes.io/pod-scale-up-delay: "150s" to pod template annotation to not trigger the autoscaler with pending pods
Or change the RollingUpdateStrategy to maxSurge: 0, maxUnavailable: 1 so no extra pod is created during rollout

Assignee:: Aaren de Jong

Reporter:: Jacob Yu

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2026/02/16 6:05 PM

Updated:: 2026/02/17 1:11 AM

Target start:: None

Target end:: None

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates