Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-61754

oauth and openshift-apiserver related disruptions around operator upgrade

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • 4.21.0
    • 4.20, 4.21
    • kube-apiserver
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • Yes
    • Approved
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      We have seen an oauth/openshift api disruption issue that is likely related to Kubernetes 1.33 bump for those repos. The pattern can be seen in this dashboard: https://grafana-loki.ci.openshift.org/d/ISnBj4LVk/disruption?orgId=1&var-percentile=P50&var-platform=azure&var-backend=oauth-api-new-connections&var-upgrade_type=minor&var-master_nodes_updated=Y&var-architectures=amd64&var-topologies=ha&var-networks=ovn&var-releases=4.21&var-releases=4.20&var-lookback=1&var-min_disruption_regression=-10&var-min_disruption_job_list=0&var-min_relevance=0&var-featureset=All&from=now-38d&to=now.
      
      This is affecting both 4.20 and 4.21 minor upgrade jobs.
      
      Here is a summary of my analysis so far:
      
      * Here is an example job for 4.21: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-azure-ovn-upgrade/1961248707785527296
      * The disruption seems to be happening while openshift-apiserver and authentication operators are being upgraded.
      * The new signature of the connection error is “etcdserver: leader changed” with code 500.
      * The new pattern seem to be showing up around those CI payloads: 4.21.0-0.ci-2025-08-29-020134 and 4.20.0-0.ci-2025-08-28-230356
      * The change list for 4.20.0-0.ci-2025-08-28-230356 is very small. Kube 1.33 update for oauth-apiserver is right there. You can also see the same change among the long list of changes in 4.21.0-0.ci-2025-08-29-020134.
      * There is a comment in that PR about payload test done in https://github.com/openshift/oauth-server/pull/197. So I checked an azure upgrade job and saw the same disruption pattern here. 

              Unassigned Unassigned
              kenzhang@redhat.com Ken Zhang
              None
              None
              Ke Wang Ke Wang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: