Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-19412

Updating to "4.14.0-rc.1" from "4.14.0-rc.0" for 5 hours: Unable to apply 4.14.0-rc.1: wait has exceeded 40 minutes for these operators: authentication, openshift-apiserver | Updated after manually deleting stuck pods

XMLWordPrintable

    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information:
      ClusterID: 513b8753-04c6-4a3a-988a-2d92b95e48f9
      ClusterVersion: Updating to "4.14.0-rc.1" from "4.14.0-rc.0" for 5 hours: Unable to apply 4.14.0-rc.1: wait has exceeded 40 minutes for these operators: authentication, openshift-apiserver
      ClusterOperators:
              clusteroperator/authentication is degraded because APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-oauth-apiserver ()
      OAuthServerDeploymentDegraded: 1 of 3 requested instances are unavailable for oauth-openshift.openshift-authentication ()
              clusteroperator/machine-config is degraded because Unable to apply 4.14.0-rc.1: error during syncRequiredMachineConfigPools: [context deadline exceeded, failed to update clusteroperator: [client rate limiter Wait returned an error: context deadline exceeded, error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 2, updated: 2, unavailable: 1)]]
              clusteroperator/network is progressing: Deployment "/openshift-ovn-kubernetes/ovnkube-control-plane" is not available (awaiting 1 nodes)
              clusteroperator/openshift-apiserver is degraded because APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver ()
      error: gather never finished for pod must-gather-wvxzn: pods "must-gather-wvxzn" not found

      Version-Release number of selected component (if applicable):

       

      How reproducible:

       

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

       

      Expected results:

       

      Additional info:

      oc get clusterversion -o yaml
      apiVersion: v1
      items:
      - apiVersion: config.openshift.io/v1
        kind: ClusterVersion
        metadata:
          creationTimestamp: "2023-09-14T19:20:36Z"
          generation: 3
          name: version
          resourceVersion: "4005495"
          uid: 7672d053-f9ed-43e3-a324-9a9bb85db483
        spec:
          channel: stable-4.14
          clusterID: 513b8753-04c6-4a3a-988a-2d92b95e48f9
          desiredUpdate:
            architecture: ""
            force: true
            image: registry.kni-qe-31.lab.eng.rdu2.redhat.com:5000/openshift-release-dev:4.14.0-rc.1-x86_64
            version: ""
        status:
          availableUpdates: null
          capabilities:
            enabledCapabilities:
            - Build
            - CSISnapshot
            - Console
            - DeploymentConfig
            - ImageRegistry
            - Insights
            - MachineAPI
            - NodeTuning
            - Storage
            - baremetal
            - marketplace
            - openshift-samples
            knownCapabilities:
            - Build
            - CSISnapshot
            - Console
            - DeploymentConfig
            - ImageRegistry
            - Insights
            - MachineAPI
            - NodeTuning
            - Storage
            - baremetal
            - marketplace
            - openshift-samples
          conditions:
          - lastTransitionTime: "2023-09-14T19:20:40Z"
            message: 'Unable to retrieve available updates: currently reconciling cluster
              version 4.14.0-rc.1 not found in the "stable-4.14" channel'
            reason: VersionNotFound
            status: "False"
            type: RetrievedUpdates
          - lastTransitionTime: "2023-09-14T19:20:40Z"
            message: Capabilities match configured spec
            reason: AsExpected
            status: "False"
            type: ImplicitlyEnabledCapabilities
          - lastTransitionTime: "2023-09-14T19:20:40Z"
            message: Payload loaded version="4.14.0-rc.1" image="registry.kni-qe-31.lab.eng.rdu2.redhat.com:5000/openshift-release-dev:4.14.0-rc.1-x86_64"
              architecture="amd64"
            reason: PayloadLoaded
            status: "True"
            type: ReleaseAccepted
          - lastTransitionTime: "2023-09-14T20:08:35Z"
            message: Done applying 4.14.0-rc.0
            status: "True"
            type: Available
          - lastTransitionTime: "2023-09-19T14:55:40Z"
            message: Cluster operators authentication, openshift-apiserver are degraded
            reason: ClusterOperatorsDegraded
            status: "True"
            type: Failing
          - lastTransitionTime: "2023-09-19T13:14:45Z"
            message: 'Unable to apply 4.14.0-rc.1: wait has exceeded 40 minutes for these
              operators: authentication, openshift-apiserver'
            reason: ClusterOperatorsDegraded
            status: "True"
            type: Progressing
          - lastTransitionTime: "2023-09-19T13:57:22Z"
            message: 'Cluster operator machine-config should not be upgraded between minor
              versions: One or more machine config pools are degraded, please see `oc get
              mcp` for further details and resolve before upgrading'
            reason: DegradedPool
            status: "False"
            type: Upgradeable
          desired:
            image: registry.kni-qe-31.lab.eng.rdu2.redhat.com:5000/openshift-release-dev:4.14.0-rc.1-x86_64
            url: https://access.redhat.com/errata/RHSA-2023:5006
            version: 4.14.0-rc.1
          history:
          - acceptedRisks: |-
              Target release version="" image="registry.kni-qe-31.lab.eng.rdu2.redhat.com:5000/openshift-release-dev:4.14.0-rc.1-x86_64" cannot be verified, but continuing anyway because the update was forced: release images that are not accessed via digest cannot be verified
              Precondition "ClusterVersionRecommendedUpdate" failed because of "UnknownUpdate": RetrievedUpdates=False (VersionNotFound), so the recommended status of updating from 4.14.0-rc.0 to 4.14.0-rc.1 is unknown.
            completionTime: null
            image: registry.kni-qe-31.lab.eng.rdu2.redhat.com:5000/openshift-release-dev:4.14.0-rc.1-x86_64
            startedTime: "2023-09-19T13:14:45Z"
            state: Partial
            verified: false
            version: 4.14.0-rc.1
          - completionTime: "2023-09-14T20:08:35Z"
            image: quay.io/openshift-release-dev/ocp-release@sha256:1d2cc38cbd94c532dc822ff793f46b23a93b76b400f7d92b13c1e1da042c88fe
            startedTime: "2023-09-14T19:20:40Z"
            state: Completed
            verified: false
            version: 4.14.0-rc.0
          observedGeneration: 3
          versionHash: MQnicHcnnoQ=
      kind: List
      metadata:
        resourceVersion: ""
      
      oc get nodes NAME       STATUS                     ROLES                         AGE     VERSION master-0   Ready                      control-plane,master,worker   4d22h   v1.27.4+2c287eb master-1   Ready,SchedulingDisabled   control-plane,master,worker   4d23h   v1.27.4+2c83a9f master-2   Ready                      control-plane,master,worker   4d23h   v1.27.4+2c287eb
      
      
       oc get co
      NAME                                       VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.14.0-rc.1   True        False         True       4d22h   APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-oauth-apiserver ()...
      baremetal                                  4.14.0-rc.1   True        False         False      4d22h
      cloud-controller-manager                   4.14.0-rc.1   True        False         False      4d23h
      cloud-credential                           4.14.0-rc.1   True        False         False      4d23h
      cluster-autoscaler                         4.14.0-rc.1   True        False         False      4d22h
      config-operator                            4.14.0-rc.1   True        False         False      4d22h
      console                                    4.14.0-rc.1   True        False         False      4d22h
      control-plane-machine-set                  4.14.0-rc.1   True        False         False      4d22h
      csi-snapshot-controller                    4.14.0-rc.1   True        False         False      4d22h
      dns                                        4.14.0-rc.1   True        False         False      4d22h
      etcd                                       4.14.0-rc.1   True        False         False      4d22h
      image-registry                             4.14.0-rc.1   True        False         False      4h23m
      ingress                                    4.14.0-rc.1   True        False         False      4d22h
      insights                                   4.14.0-rc.1   True        False         False      4d22h
      kube-apiserver                             4.14.0-rc.1   True        False         False      4d22h
      kube-controller-manager                    4.14.0-rc.1   True        False         False      4d22h
      kube-scheduler                             4.14.0-rc.1   True        False         False      4d22h
      kube-storage-version-migrator              4.14.0-rc.1   True        False         False      4h35m
      machine-api                                4.14.0-rc.1   True        False         False      4d22h
      machine-approver                           4.14.0-rc.1   True        False         False      4d22h
      machine-config                             4.14.0-rc.0   True        True          True       4d22h   Unable to apply 4.14.0-rc.1: error during syncRequiredMachineConfigPools: [context deadline exceeded, failed to update clusteroperator: [client rate limiter Wait returned an error: context deadline exceeded, error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 2, updated: 2, unavailable: 1)]]
      marketplace                                4.14.0-rc.1   True        False         False      4d22h
      monitoring                                 4.14.0-rc.1   True        False         False      4d22h
      network                                    4.14.0-rc.1   True        True          False      4d22h   Deployment "/openshift-ovn-kubernetes/ovnkube-control-plane" is not available (awaiting 1 nodes)
      node-tuning                                4.14.0-rc.1   True        False         False      4d22h
      openshift-apiserver                        4.14.0-rc.1   True        False         True       4d22h   APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver ()
      openshift-controller-manager               4.14.0-rc.1   True        False         False      4d22h
      openshift-samples                          4.14.0-rc.1   True        False         False      5h4m
      operator-lifecycle-manager                 4.14.0-rc.1   True        False         False      4d22h
      operator-lifecycle-manager-catalog         4.14.0-rc.1   True        False         False      4d22h
      operator-lifecycle-manager-packageserver   4.14.0-rc.1   True        False         False      4d22h
      service-ca                                 4.14.0-rc.1   True        False         False      4d22h
      storage                                    4.14.0-rc.1   True        False         False      4d22h
      
      
      
      
      oc get pods -n openshift-apiserver
      NAME                         READY   STATUS    RESTARTS   AGE
      apiserver-85c5fb6d7c-25mqf   2/2     Running   0          5h10m
      apiserver-85c5fb6d7c-bzgft   0/2     Pending   0          4h53m
      apiserver-85c5fb6d7c-ms9nk   2/2     Running   0          5h2m
      [kni@registry.kni-qe-31 post-config]$ oc get events -n openshift-apiserver
      LAST SEEN   TYPE      REASON             OBJECT                           MESSAGE
      12m         Warning   FailedScheduling   pod/apiserver-85c5fb6d7c-bzgft   0/3 nodes are available: 1 node(s) were unschedulable, 2 node(s) didn't match pod anti-affinity rules. preemption: 0/3 nodes are available: 1 Preemption is not helpful for scheduling, 2 node(s) didn't match pod anti-affinity rules..
      
      oc get events -A | grep machine-config
      openshift-machine-config-operator            22m         Warning   OperatorDegraded: RequiredPoolsFailed   /machine-config                                                    Unable to apply 4.14.0-rc.1: error during syncRequiredMachineConfigPools: [context deadline exceeded, failed to update clusteroperator: [client rate limiter Wait returned an error: context deadline exceeded, error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 2, updated: 2, unavailable: 1)]]
      
      
      
      oc logs machine-config-daemon-drb4p -c machine-config-daemon -n openshift-machine-config-operator
      
      oc adm drain master-1 --grace-period=20 --ignore-daemonsets --force=true --delete-emptydir-data --timeout=60s
      node/master-1 already cordoned
      Warning: ignoring DaemonSet-managed Pods: openshift-cluster-node-tuning-operator/tuned-lvg5p, openshift-dns/dns-default-l8kzl, openshift-dns/node-resolver-vnk7g, openshift-image-registry/node-ca-kwtw4, openshift-ingress-canary/ingress-canary-jcnff, openshift-local-storage/diskmaker-manager-7scmt, openshift-logging/collector-fjbwq, openshift-machine-api/ironic-proxy-6hdx6, openshift-machine-config-operator/machine-config-daemon-drb4p, openshift-machine-config-operator/machine-config-server-zzsfq, openshift-monitoring/node-exporter-dt26h, openshift-multus/multus-7fblb, openshift-multus/multus-additional-cni-plugins-wssh8, openshift-multus/network-metrics-daemon-ptf2k, openshift-network-diagnostics/network-check-target-gdpch, openshift-ovn-kubernetes/ovnkube-node-pw7w6, openshift-sriov-network-operator/network-resources-injector-lv5kj, openshift-sriov-network-operator/operator-webhook-ntnn6, openshift-sriov-network-operator/sriov-device-plugin-896fr, openshift-sriov-network-operator/sriov-network-config-daemon-ctmc4, openshift-storage/csi-cephfsplugin-2xgf4, openshift-storage/csi-rbdplugin-ct9lp
      evicting pod nqldh/mypod-nqldh
      evicting pod c21gn/mypod-c21gn
      evicting pod hxm70/mypod-hxm70
      There are pending pods in node "master-1" when an error occurred: [error when waiting for pod "mypod-c21gn" terminating: global timeout reached: 1m0s, error when waiting for pod "mypod-hxm70" terminating: global timeout reached: 1m0s, error when waiting for pod "mypod-nqldh" terminating: global timeout reached: 1m0s]
      pod/mypod-c21gn
      pod/mypod-hxm70
      pod/mypod-nqldh
      error: unable to drain node "master-1" due to error:[error when waiting for pod "mypod-c21gn" terminating: global timeout reached: 1m0s, error when waiting for pod "mypod-hxm70" terminating: global timeout reached: 1m0s, error when waiting for pod "mypod-nqldh" terminating: global timeout reached: 1m0s], continuing command...
      There are pending nodes to be drained:
       master-1
      error when waiting for pod "mypod-c21gn" terminating: global timeout reached: 1m0s
      error when waiting for pod "mypod-hxm70" terminating: global timeout reached: 1m0s
      error when waiting for pod "mypod-nqldh" terminating: global timeout reached: 1m0s
      
      Forced drained but had to force delete these pods
      Then
      oc adm drain master-1 --grace-period=20 --ignore-daemonsets --force=true --delete-emptydir-data --timeout=60s
      node/master-1 already cordoned
      Warning: ignoring DaemonSet-managed Pods: openshift-cluster-node-tuning-operator/tuned-lvg5p, openshift-dns/dns-default-l8kzl, openshift-dns/node-resolver-vnk7g, openshift-image-registry/node-ca-kwtw4, openshift-ingress-canary/ingress-canary-jcnff, openshift-local-storage/diskmaker-manager-7scmt, openshift-logging/collector-fjbwq, openshift-machine-api/ironic-proxy-6hdx6, openshift-machine-config-operator/machine-config-daemon-drb4p, openshift-machine-config-operator/machine-config-server-zzsfq, openshift-monitoring/node-exporter-dt26h, openshift-multus/multus-7fblb, openshift-multus/multus-additional-cni-plugins-wssh8, openshift-multus/network-metrics-daemon-ptf2k, openshift-network-diagnostics/network-check-target-gdpch, openshift-ovn-kubernetes/ovnkube-node-pw7w6, openshift-sriov-network-operator/network-resources-injector-lv5kj, openshift-sriov-network-operator/operator-webhook-ntnn6, openshift-sriov-network-operator/sriov-device-plugin-896fr, openshift-sriov-network-operator/sriov-network-config-daemon-ctmc4, openshift-storage/csi-cephfsplugin-2xgf4, openshift-storage/csi-rbdplugin-ct9lp
      node/master-1 drained
      
      oc adm uncordon master-1
      node/master-1 uncordoned
      [kni@registry.kni-qe-31 post-config]$ oc get nodes
      NAME       STATUS     ROLES                         AGE     VERSION
      master-0   Ready      control-plane,master,worker   4d23h   v1.27.4+2c287eb
      master-1   NotReady   control-plane,master,worker   5d      v1.27.4+2c83a9f
      master-2   Ready      control-plane,master,worker   5d      v1.27.4+2c287eb
      
      oc get clusterversion
      NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.14.0-rc.1   True        False         27s     Cluster version is 4.14.0-rc.1
      
      
      It finally upgrade but why does the upgrade not handle stuck pods?

              afri@afri.cz Petr Muller
              mlammon@redhat.com Mike Lammon
              Jia Liu Jia Liu
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: