Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-59894

CI fails on testGatewayAPIIstioInstallation because Istiod has too many pods

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 2
    • Low
    • None
    • None
    • Rejected
    • NI&D Sprint 274, NI&D Sprint 275
    • 2
    • Done
    • Bug Fix
    • Hide
      * Before this update, the `HorizontalPodAutoscaler` object temporarily scaled the `istiod-openshift-gateway` deployment to two replicas. This caused Continuous Integration (CI) failure because the tests expected one replica. With this release, the`HorizontalPodAutoscaler` object scaling verifies that the `istiod-openshift-gateway` resource has at least one replica to continue deployment. (link:https://issues.redhat.com/browse/OCPBUGS-59894[OCPBUGS-59894])
      Show
      * Before this update, the `HorizontalPodAutoscaler` object temporarily scaled the `istiod-openshift-gateway` deployment to two replicas. This caused Continuous Integration (CI) failure because the tests expected one replica. With this release, the`HorizontalPodAutoscaler` object scaling verifies that the `istiod-openshift-gateway` resource has at least one replica to continue deployment. (link: https://issues.redhat.com/browse/OCPBUGS-59894 [ OCPBUGS-59894 ])
    • None
    • None
    • None
    • None

      Description of problem

      CI can fail because of test failures such as the following:

          gateway_api_test.go:158: failed to find expected Istiod control plane: too many pods for deployment openshift-ingress/istiod-openshift-gateway: 2
      

      This failure comes from https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1245/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/1949881963301048320.

      Version-Release number of selected component (if applicable)

      I have seen this in 4.20.

      How reproducible

      I have only seen it happen once.

      Steps to Reproduce

      1. Post a PR and have bad luck.

      Actual results

      CI fails.

      Expected results

      CI passes, or fails on some other test failure.

      Additional info

      The failure occurred because HPA scaled istiod out temporarily to 2 replicas. I found the following event in the must-gather archive for the referenced CI run:

      apiVersion: v1
      count: 1
      eventTime: null
      firstTimestamp: "2025-07-28T19:10:42Z"
      involvedObject:
        apiVersion: autoscaling/v2
        kind: HorizontalPodAutoscaler
        name: istiod-openshift-gateway
        namespace: openshift-ingress
        resourceVersion: "86760"
        uid: 112b10f4-bad3-433f-9bb0-f0c1ca333e06
      kind: Event
      lastTimestamp: "2025-07-28T19:10:42Z"
      message: 'New size: 2; reason: cpu resource utilization (percentage of request)
        above target'
      metadata:
        creationTimestamp: "2025-07-28T19:10:42Z"
        managedFields:
        # ...
        name: istiod-openshift-gateway.18567fffeaf8b275
        namespace: openshift-ingress
        resourceVersion: "86914"
        uid: e3f5d32f-9370-46be-ae56-934293cf68f7
      reason: SuccessfulRescale
      reportingComponent: horizontal-pod-autoscaler
      reportingInstance: ""
      source:
        component: horizontal-pod-autoscaler
      type: Normal
      

      We can consider turning off HPA, but it isn't clear why the test expects the number of pod replicas to be exactly 1.

              rh-ee-iamin Ishmam Amin
              mmasters1@redhat.com Miciah Masters
              None
              None
              Ishmam Amin Ishmam Amin
              Darragh Fitzmaurice Darragh Fitzmaurice
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: