-
Bug
-
Resolution: Cannot Reproduce
-
Undefined
-
None
-
4.16
-
None
-
No
-
False
-
Description of problem:
periodic-ci-openshift-openshift-tests-private-release-4.16-multi-nightly-4.16-upgrade-from-stable-4.15-azure-ipi-fullyprivate-proxy-arm-f28
Running command: oc adm upgrade --to-image=registry.build02.ci.openshift.org/ci-op-y87p0c68/release@sha256:0453dcf90e6f1e6ba6b8eb197d520c95f47cb2e3906dc4d98902f10412f90ceb --allow-explicit-upgrade --force=true warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade for the update to proceed anyway warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures. Requested update to release image registry.build02.ci.openshift.org/ci-op-y87p0c68/release@sha256:0453dcf90e6f1e6ba6b8eb197d520c95f47cb2e3906dc4d98902f10412f90ceb Upgrading cluster to registry.build02.ci.openshift.org/ci-op-y87p0c68/release@sha256:0453dcf90e6f1e6ba6b8eb197d520c95f47cb2e3906dc4d98902f10412f90ceb gets started... Starting the upgrade checking on 2024-06-30 18:24:46 Running command: oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.15.19 True True 4m44s Working towards 4.16.0-0.nightly-multi-2024-06-27-053432: 110 of 894 done (12% complete), waiting on etcd, kube-apiserver
Version-Release number of selected component (if applicable):
4.16.0-0.nightly-multi-2024-06-27-053432
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
# oc adm upgrade status Unable to fetch alerts, ignoring alerts in 'Update Health': failed to get alerts from Thanos: no token is currently in use for this session = Control Plane = Assessment: Progressing Target Version: 4.16.0-0.nightly-multi-2024-06-27-053432 (from 4.15.19) Completion: 97% Duration: 2h10m21.917016007s Operator Status: 25 Healthy, 1 Unavailable, 7 Available but degraded Control Plane Nodes NAME ASSESSMENT PHASE VERSION EST MESSAGE ci-op-y87p0c68-80996-qqdwx-master-2 Progressing Updating 4.15.19 +20m ci-op-y87p0c68-80996-qqdwx-master-0 Completed Updated 4.16.0-0.nightly-multi-2024-06-27-053432 - ci-op-y87p0c68-80996-qqdwx-master-1 Completed Updated 4.16.0-0.nightly-multi-2024-06-27-053432 - = Worker Upgrade = = Worker Pool = Worker Pool: worker Assessment: Completed Completion: 100% Worker Status: 4 Total, 4 Available, 0 Progressing, 0 Outdated, 0 Draining, 0 Excluded, 0 Degraded Worker Pool Nodes NAME ASSESSMENT PHASE VERSION EST MESSAGE ci-op-y87p0c68-80996-qqdwx-41804-k7z6m Completed Updated 4.16.0-0.nightly-multi-2024-06-27-053432 - ci-op-y87p0c68-80996-qqdwx-worker-southcentralus1-w4dc4 Completed Updated 4.16.0-0.nightly-multi-2024-06-27-053432 - ci-op-y87p0c68-80996-qqdwx-worker-southcentralus2-8fg97 Completed Updated 4.16.0-0.nightly-multi-2024-06-27-053432 - ci-op-y87p0c68-80996-qqdwx-worker-southcentralus3-2kskb Completed Updated 4.16.0-0.nightly-multi-2024-06-27-053432 - = Worker Pool = Worker Pool: worker-pao Assessment: Completed Completion: 100% Worker Status: 1 Total, 1 Available, 0 Progressing, 0 Outdated, 0 Draining, 0 Excluded, 0 Degraded Worker Pool Node NAME ASSESSMENT PHASE VERSION EST MESSAGE ci-op-y87p0c68-80996-qqdwx-worker-southcentralus1-w4dc4 Completed Updated 4.16.0-0.nightly-multi-2024-06-27-053432 - = Update Health = Message: Cluster Operator authentication is degraded (APIServerDeployment_UnavailablePod::OAuthServerDeployment_UnavailablePod) Since: 43m58s Level: Error Impact: API Availability Reference: https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/ClusterOperatorDegraded.md Resources: clusteroperators.config.openshift.io: authentication Description: APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-oauth-apiserver () , OAuthServerDeploymentDegraded: 1 of 3 requested instances are unavailable for oauth-openshift.openshift-authentication () Message: Cluster Operator machine-config is degraded (MachineConfigDaemonFailed) Since: 44m28s Level: Error Impact: API Availability Reference: https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/ClusterOperatorDegraded.md Resources: clusteroperators.config.openshift.io: machine-config Description: Unable to apply 4.16.0-0.nightly-multi-2024-06-27-053432: error during waitForDaemonsetRollout: [context deadline exceeded, daemonset machine-config-daemon is not ready. status: (desired: 7, updated: 7, ready: 6, unavailable: 1)] Message: Cluster Operator openshift-apiserver is degraded (APIServerDeployment_UnavailablePod) Since: 44m32s Level: Error Impact: API Availability Reference: https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/ClusterOperatorDegraded.md Resources: clusteroperators.config.openshift.io: openshift-apiserver Description: APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver () Message: Cluster Operator etcd is degraded (EtcdCertSignerController_Error::EtcdEndpoints_ErrorUpdatingEtcdEndpoints::EtcdMembers_UnhealthyMembers::NodeController_MasterNodesReady) Since: 1h8m47s Level: Error Impact: API Availability Reference: https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/ClusterOperatorDegraded.md Resources: clusteroperators.config.openshift.io: etcd Description: EtcdCertSignerControllerDegraded: EtcdCertSignerController can't evaluate whether quorum is safe: etcd cluster has quorum of 2 and 2 healthy members which is not fault tolerant: [{Member:ID:1612627219685692483 name:"ci-op-y87p0c68-80996-qqdwx-master-2" peerURLs:"https://10.0.0.7:2380" clientURLs:"https://10.0.0.7:2379" Healthy:false Took: Error:create client failure: failed to make etcd client for endpoints [https://10.0.0.7:2379]: context deadline exceeded} {Member:ID:7859641474667542729 name:"ci-op-y87p0c68-80996-qqdwx-master-0" peerURLs:"https://10.0.0.8:2380" clientURLs:"https://10.0.0.8:2379" Healthy:true Took:1.638123ms Error:<nil>} {Member:ID:16394202218457999149 name:"ci-op-y87p0c68-80996-qqdwx-master-1" peerURLs:"https://10.0.0.6:2380" clientURLs:"https://10.0.0.6:2379" Healthy:true Took:2.867525ms Error:<nil>}] , EtcdEndpointsDegraded: EtcdEndpointsController can't evaluate whether quorum is safe: etcd cluster has quorum of 2 and 2 healthy members which is not fault tolerant: [{Member:ID:1612627219685692483 name:"ci-op-y87p0c68-80996-qqdwx-master-2" peerURLs:"https://10.0.0.7:2380" clientURLs:"https://10.0.0.7:2379" Healthy:false Took: Error:create client failure: failed to make etcd client for endpoints [https://10.0.0.7:2379]: context deadline exceeded} {Member:ID:7859641474667542729 name:"ci-op-y87p0c68-80996-qqdwx-master-0" peerURLs:"https://10.0.0.8:2380" clientURLs:"https://10.0.0.8:2379" Healthy:true Took:3.586207ms Error:<nil>} {Member:ID:16394202218457999149 name:"ci-op-y87p0c68-80996-qqdwx-master-1" peerURLs:"https://10.0.0.6:2380" clientURLs:"https://10.0.0.6:2379" Healthy:true Took:1.774003ms Error:<nil>}] , EtcdMembersDegraded: 2 of 3 members are available, ci-op-y87p0c68-80996-qqdwx-master-2 is unhealthy , NodeControllerDegraded: The master nodes not ready: node "ci-op-y87p0c68-80996-qqdwx-master-2" not ready since 2024-06-30 19:24:33 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.) Message: Cluster Operator kube-apiserver is degraded (NodeController_MasterNodesReady) Since: 1h8m50s Level: Error Impact: API Availability Reference: https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/ClusterOperatorDegraded.md Resources: clusteroperators.config.openshift.io: kube-apiserver Description: NodeControllerDegraded: The master nodes not ready: node "ci-op-y87p0c68-80996-qqdwx-master-2" not ready since 2024-06-30 19:24:33 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.) Message: Cluster Operator kube-controller-manager is degraded (NodeController_MasterNodesReady) Since: 1h8m50s Level: Error Impact: API Availability Reference: https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/ClusterOperatorDegraded.md Resources: clusteroperators.config.openshift.io: kube-controller-manager Description: NodeControllerDegraded: The master nodes not ready: node "ci-op-y87p0c68-80996-qqdwx-master-2" not ready since 2024-06-30 19:24:33 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.) Message: Cluster Operator kube-scheduler is degraded (NodeController_MasterNodesReady) Since: 1h8m50s Level: Error Impact: API Availability Reference: https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/ClusterOperatorDegraded.md Resources: clusteroperators.config.openshift.io: kube-scheduler Description: NodeControllerDegraded: The master nodes not ready: node "ci-op-y87p0c68-80996-qqdwx-master-2" not ready since 2024-06-30 19:24:33 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.) Message: Cluster Operator control-plane-machine-set is unavailable (UnavailableReplicas) Since: 19m45s Level: Warning Impact: API Availability Reference: https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/ClusterOperatorDown.md Resources: clusteroperators.config.openshift.io: control-plane-machine-set Description: Missing 1 available replica(s) Message: Cluster Version version is failing to proceed with the update (ClusterOperatorsDegraded) Since: 28m41s Level: Warning Impact: Update Stalled Reference: https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/ClusterOperatorDegraded.md Resources: clusterversions.config.openshift.io: version Description: Cluster operators etcd, kube-apiserver are degraded
Expected results:
Upgrade should pass
Additional info: