Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-63224

Flake: [sig-autoscaling] [Feature:HPA] Horizontal pod autoscaling (scale resource: CPU) CustomResourceDefinition Should scale with a CRD targetRef

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Low
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Component readiness failures on: 
       [sig-autoscaling] [Feature:HPA] Horizontal pod autoscaling (scale 
      resource: CPU) CustomResourceDefinition Should scale with a CRD 
      targetRef [Suite:openshift/conformance/parallel] [Suite:k8s] 
                  
                  
      Always looks like: 
                      {  fail [k8s.io/kubernetes/test/e2e/autoscaling/horizontal_pod_autoscaling.go:211]: timeout waiting 15m0s for 2 replicas: Told to stop trying after 60.251s.
      Unexpected final error while getting int: Operation cannot be fulfilled on testcrds.autoscalinge2e.example.com "foo-crd": the object has been modified; please apply your changes to the latest version and try again
      At one point, however, the function did return successfully.
      Yet, Eventually failed because the matcher was not satisfied:
      Expected
          <int>: 1
      to equal
          <int>: 2}

      Version-Release number of selected component (if applicable):

          HPA OpenShift 4.20

      How reproducible:

          ~5% reproduce rate, but it does flake 

      Steps to Reproduce:

          1.Run that test case in a loop
          2.Observe failure when the status update to the CRD happens _after_ the HPA updates the custom resource 
          3.Profit 
          

      Actual results:

          Test case occasionally flakes

      Expected results:

          Test case does not flake

      Additional info:

          This is such a strange failure because the failure happens as part of a weird hack upstream is using to imitate controller behavior -- it updates the status with the replica count (the HPA needs that info to scale) but sometimes the HPA is too fast and it loses the race, and the retry function doesn't currently regard a collision as a retryable error so it just fails. 

      The update in question that is colliding: https://github.com/kubernetes/kubernetes/blob/3b632270e9b866ee8bf62e89377ae95987671b49/test/e2e/framework/autoscaling/autoscaling_utils.go#L480 

              jkyros@redhat.com John Kyros
              jkyros@redhat.com John Kyros
              None
              None
              Paul Rozehnal Paul Rozehnal
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: