Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-65670

cloud-credential-operator hitting quota errors

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Approved
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      (Feel free to update this bug's summary to be more specific.)
      Component Readiness has found a potential regression in the following test:

      Infrastructure - quota exceeded or hit rate limit

      Significant regression detected.
      Fishers Exact probability of a regression: 99.97%.
      Test pass rate dropped from 99.32% to 92.50%.

      Sample (being evaluated) Release: 4.21
      Start Time: 2025-11-10T00:00:00Z
      End Time: 2025-11-17T12:00:00Z
      Success Rate: 92.50%
      Successes: 37
      Failures: 3
      Flakes: 0
      Base (historical) Release: 4.20
      Start Time: 2025-10-18T00:00:00Z
      End Time: 2025-11-17T12:00:00Z
      Success Rate: 99.32%
      Successes: 146
      Failures: 1
      Flakes: 0

      View the test details report for additional context.

      Appears to be new in 4.21.

      {  pods/openshift-cloud-credential-operator_cloud-credential-operator-69b65b8776-g9kb4_cloud-credential-operator.log.gz:time="2025-11-13T19:02:19Z" level=error msg="unknown error getting user: ci-op-czqwkfvw-1389b-openshift-cloud-network-config-contro-x96lq" actuator=aws cr=openshift-cloud-credential-operator/openshift-cloud-network-config-controller-aws error="operation error IAM: GetUser, exceeded maximum number of attempts, 3, https response error StatusCode: 400, RequestID: f6c8f7e6-e66b-426e-ae4e-e99e5d230f0e, api error Throttling: Rate exceeded"
      

      We don't see CCO here much so just to make sure the recipient of this bug is aware, this is a component readiness regression, we treat these as release blockers as we're showing a statistically significant change in behaviour. In this case, a problem that didn't occur before at all, now occurs several times a week. We need to keep the signal stable to know if we're stable, so we cannot ignore the problem. If the issue cannot be rectified in time for GA an SBAR document will have to be prepared for OCP leadership explaining why the regression should be allowed and could not be fixed.

      Using this larger report over the past month it looks quite recent, the last week or so.

      Given the nature of this, perhaps it's just a temporary overload? Are there any recent CCO changes?

      Filed by: dgoodwin@redhat.com

              jialiu@redhat.com Johnny Liu
              openshift-trt OpenShift Technical Release Team
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: