Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.20
Component/s: Pod Autoscaler
Labels:
- component-readiness

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Low
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Component readiness failures on: 
 [sig-autoscaling] [Feature:HPA] Horizontal pod autoscaling (scale 
resource: CPU) CustomResourceDefinition Should scale with a CRD 
targetRef [Suite:openshift/conformance/parallel] [Suite:k8s] 
            
            
Always looks like: 
                {  fail [k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:211]: timeout waiting 15m0s for 2 replicas: Told to stop trying after 60.251s.
Unexpected final error while getting int: Operation cannot be fulfilled on testcrds.autoscalinge2e.example.com "foo-crd": the object has been modified; please apply your changes to the latest version and try again
At one point, however, the function did return successfully.
Yet, Eventually failed because the matcher was not satisfied:
Expected
    <int>: 1
to equal
    <int>: 2}

Version-Release number of selected component (if applicable):

    HPA OpenShift 4.20

How reproducible:

    ~5% reproduce rate, but it does flake

Steps to Reproduce:

    1.Run that test case in a loop
    2.Observe failure when the status update to the CRD happens _after_ the HPA updates the custom resource 
    3.Profit

Actual results:

    Test case occasionally flakes

Expected results:

    Test case does not flake

Additional info:

    This is such a strange failure because the failure happens as part of a weird hack upstream is using to imitate controller behavior -- it updates the status with the replica count (the HPA needs that info to scale) but sometimes the HPA is too fast and it loses the race, and the retry function doesn't currently regard a collision as a retryable error so it just fails.

The update in question that is colliding: https://github.com/kubernetes/kubernetes/blob/3b632270e9b866ee8bf62e89377ae95987671b49/test/e2e/framework/autoscaling/autoscaling_utils.go#L480

links to

https://github.com/kubernetes/kubernetes/pull/134664

Assignee:: John Kyros

Reporter:: John Kyros

Need Info From:: None

Contributors:: None

QA Contact:: Paul Rozehnal

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/10/16 9:16 PM

Updated:: 2025/10/17 12:56 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates