Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-60204

[release-4.19] CI fails on testGatewayAPIIstioInstallation because Istiod has too many pods

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 1
    • Low
    • None
    • None
    • Rejected
    • NI&D Sprint 276
    • 1
    • Proposed
    • Bug Fix
    • Hide
      Before this update, the `HorizontalPodAutoscaler` temporarily scaled `istiod-openshift-gateway` deployment to two replicas, causing Continuous Integration (CI) failure due to the tests expecting only one replica. With this release, `HorizontalPodAutoscaler` scaling verifies that the `istiod-openshift-gateway` has at least one replica to continue deployment. (link:https://issues.redhat.com/browse/OCPBUGS-60204[OCPBUGS-60204])
      Show
      Before this update, the `HorizontalPodAutoscaler` temporarily scaled `istiod-openshift-gateway` deployment to two replicas, causing Continuous Integration (CI) failure due to the tests expecting only one replica. With this release, `HorizontalPodAutoscaler` scaling verifies that the `istiod-openshift-gateway` has at least one replica to continue deployment. (link: https://issues.redhat.com/browse/OCPBUGS-60204 [ OCPBUGS-60204 ])
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-59894. The following is the description of the original issue:

      Description of problem

      CI can fail because of test failures such as the following:

          gateway_api_test.go:158: failed to find expected Istiod control plane: too many pods for deployment openshift-ingress/istiod-openshift-gateway: 2
      

      This failure comes from https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1245/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/1949881963301048320.

      Version-Release number of selected component (if applicable)

      I have seen this in 4.20.

      How reproducible

      I have only seen it happen once.

      Steps to Reproduce

      1. Post a PR and have bad luck.

      Actual results

      CI fails.

      Expected results

      CI passes, or fails on some other test failure.

      Additional info

      The failure occurred because HPA scaled istiod out temporarily to 2 replicas. I found the following event in the must-gather archive for the referenced CI run:

      apiVersion: v1
      count: 1
      eventTime: null
      firstTimestamp: "2025-07-28T19:10:42Z"
      involvedObject:
        apiVersion: autoscaling/v2
        kind: HorizontalPodAutoscaler
        name: istiod-openshift-gateway
        namespace: openshift-ingress
        resourceVersion: "86760"
        uid: 112b10f4-bad3-433f-9bb0-f0c1ca333e06
      kind: Event
      lastTimestamp: "2025-07-28T19:10:42Z"
      message: 'New size: 2; reason: cpu resource utilization (percentage of request)
        above target'
      metadata:
        creationTimestamp: "2025-07-28T19:10:42Z"
        managedFields:
        # ...
        name: istiod-openshift-gateway.18567fffeaf8b275
        namespace: openshift-ingress
        resourceVersion: "86914"
        uid: e3f5d32f-9370-46be-ae56-934293cf68f7
      reason: SuccessfulRescale
      reportingComponent: horizontal-pod-autoscaler
      reportingInstance: ""
      source:
        component: horizontal-pod-autoscaler
      type: Normal
      

      We can consider turning off HPA, but it isn't clear why the test expects the number of pod replicas to be exactly 1.

              rh-ee-iamin Ishmam Amin
              mmasters1@redhat.com Miciah Masters
              None
              None
              Ishmam Amin Ishmam Amin
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: