Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-34408

[Upgrade] kube-apiserver stuck in updating versions when upgrade from old releases

XMLWordPrintable

    • ?
    • Critical
    • No
    • Approved
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      This is a clone of issue OCPBUGS-33963. The following is the description of the original issue:

      Description of problem:

      kube-apiserver was stuck in updating versions when upgrade from 4.1 to 4.16 with AWS ipi installation
          

      Version-Release number of selected component (if applicable):

      4.16.0-0.nightly-2024-05-01-111315
          

      How reproducible:

          always
          

      Steps to Reproduce:

          1. IPI Install an AWS 4.1 cluster, upgrade it to 4.16
          2. Upgrade was stuck in 4.15 to 4.16, waiting on etcd, kube-apiserver updating
          
          

      Actual results:

         1. Upgrade was stuck in 4.15 to 4.16, waiting on etcd, kube-apiserver updating
         $ oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.15.0-0.nightly-2024-05-16-091947   True        True          39m     Working towards 4.16.0-0.nightly-2024-05-16-092402: 111 of 894 done (12% complete)
      
          

      Expected results:

      Upgrade should be successful.
          

      Additional info:

      Must-gather: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.1-aws-ipi-f30/1791391925467615232/artifacts/aws-ipi-f30/gather-must-gather/artifacts/must-gather.tar
      
      Checked the must-gather logs, 
      $ omg get clusterversion -oyaml
      ...
      conditions:
        - lastTransitionTime: '2024-05-17T09:35:29Z'
          message: Done applying 4.15.0-0.nightly-2024-05-16-091947
          status: 'True'
          type: Available
        - lastTransitionTime: '2024-05-18T06:31:41Z'
          message: 'Multiple errors are preventing progress:
      
            * Cluster operator kube-apiserver is updating versions
      
            * Could not update flowschema "openshift-etcd-operator" (82 of 894): the server
            does not recognize this resource, check extension API servers'
          reason: MultipleErrors
          status: 'True'
          type: Failing
      
      $ omg get co | grep -v '.*True.*False.*False'
      NAME                                      VERSION                             AVAILABLE  PROGRESSING  DEGRADED  SINCE
      kube-apiserver                            4.15.0-0.nightly-2024-05-16-091947  True       True         False     10m
      
      $ omg get pod -n openshift-kube-apiserver
      NAME                                               READY  STATUS     RESTARTS  AGE
      installer-40-ip-10-0-136-146.ec2.internal          0/1    Succeeded  0         2h29m
      installer-41-ip-10-0-143-206.ec2.internal          0/1    Succeeded  0         2h25m
      installer-43-ip-10-0-154-116.ec2.internal          0/1    Succeeded  0         2h22m
      installer-44-ip-10-0-154-116.ec2.internal          0/1    Succeeded  0         1h35m
      kube-apiserver-guard-ip-10-0-136-146.ec2.internal  1/1    Running    0         2h24m
      kube-apiserver-guard-ip-10-0-143-206.ec2.internal  1/1    Running    0         2h24m
      kube-apiserver-guard-ip-10-0-154-116.ec2.internal  0/1    Running    0         2h24m
      kube-apiserver-ip-10-0-136-146.ec2.internal        5/5    Running    0         2h27m
      kube-apiserver-ip-10-0-143-206.ec2.internal        5/5    Running    0         2h24m
      kube-apiserver-ip-10-0-154-116.ec2.internal        4/5    Running    17        1h34m
      revision-pruner-39-ip-10-0-136-146.ec2.internal    0/1    Succeeded  0         2h44m
      revision-pruner-39-ip-10-0-143-206.ec2.internal    0/1    Succeeded  0         2h50m
      revision-pruner-39-ip-10-0-154-116.ec2.internal    0/1    Succeeded  0         2h52m
      revision-pruner-40-ip-10-0-136-146.ec2.internal    0/1    Succeeded  0         2h29m
      revision-pruner-40-ip-10-0-143-206.ec2.internal    0/1    Succeeded  0         2h29m
      revision-pruner-40-ip-10-0-154-116.ec2.internal    0/1    Succeeded  0         2h29m
      revision-pruner-41-ip-10-0-136-146.ec2.internal    0/1    Succeeded  0         2h26m
      revision-pruner-41-ip-10-0-143-206.ec2.internal    0/1    Succeeded  0         2h26m
      revision-pruner-41-ip-10-0-154-116.ec2.internal    0/1    Succeeded  0         2h26m
      revision-pruner-42-ip-10-0-136-146.ec2.internal    0/1    Succeeded  0         2h24m
      revision-pruner-42-ip-10-0-143-206.ec2.internal    0/1    Succeeded  0         2h23m
      revision-pruner-42-ip-10-0-154-116.ec2.internal    0/1    Succeeded  0         2h23m
      revision-pruner-43-ip-10-0-136-146.ec2.internal    0/1    Succeeded  0         2h23m
      revision-pruner-43-ip-10-0-143-206.ec2.internal    0/1    Succeeded  0         2h23m
      revision-pruner-43-ip-10-0-154-116.ec2.internal    0/1    Succeeded  0         2h23m
      revision-pruner-44-ip-10-0-136-146.ec2.internal    0/1    Succeeded  0         1h35m
      revision-pruner-44-ip-10-0-143-206.ec2.internal    0/1    Succeeded  0         1h35m
      revision-pruner-44-ip-10-0-154-116.ec2.internal    0/1    Succeeded  0         1h35m
      
      Checked the kube-apiserver kube-apiserver-ip-10-0-154-116.ec2.internal logs, seems something wring with informers, 
      $ grep 'informers not started yet' current.log  | wc -l
      360
      
      $ grep 'informers not started yet' current.log 
      2024-05-18T06:34:51.888804183Z [-]informer-sync failed: 4 informers not started yet: [*v1.PriorityLevelConfiguration *v1.Secret *v1.FlowSchema *v1.ConfigMap]
      2024-05-18T06:34:51.889350484Z [-]informer-sync failed: 4 informers not started yet: [*v1.PriorityLevelConfiguration *v1.FlowSchema *v1.Secret *v1.ConfigMap]
      2024-05-18T06:34:52.004808401Z [-]informer-sync failed: 2 informers not started yet: [*v1.FlowSchema *v1.PriorityLevelConfiguration]
      2024-05-18T06:34:52.095516498Z [-]informer-sync failed: 2 informers not started yet: [*v1.PriorityLevelConfiguration *v1.FlowSchema]
      ...
      
      
          

            bluddy Ben Luddy
            openshift-crt-jira-prow OpenShift Prow Bot
            Ke Wang Ke Wang
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: