-
Bug
-
Resolution: Done-Errata
-
Major
-
None
-
4.14
-
No
-
Rejected
-
False
-
Description of problem:
As title, update 4.13 to 4.14 in nutanix ipi disconnected cluster, upgade failed and after check the log of openshift-etcd-operator pod, found below msg:
2023-09-02T21:58:56.826959297Z E0902 21:58:56.826925 1 base_controller.go:268] EtcdEndpointsController reconciliation failed: EtcdEndpointsController can't evaluate whether quorum is safe: etcd cluster has quorum of 2 and 2 healthy members which is not fault tolerant: [
{Member:ID:5365111498206899750 name:"ci-op-j1vt7ci9-6d143-z86rp-master-2" peerURLs:"https://10.0.133.232:2380" clientURLs:"https://10.0.133.232:2379" Healthy:true Took:1.048481ms Error:<nil>}{Member:ID:7970579734833654707 name:"ci-op-j1vt7ci9-6d143-z86rp-master-1" peerURLs:"https://10.0.133.237:2380" clientURLs:"https://10.0.133.237:2379" Healthy:false Took: Error:create client failure: failed to make etcd client for endpoints [https://10.0.133.237:2379]: context deadline exceeded}
{Member:ID:16371899399695993943 name:"ci-op-j1vt7ci9-6d143-z86rp-master-0" peerURLs:"https://10.0.133.203:2380" clientURLs:"https://10.0.133.203:2379" Healthy:true Took:810.982µs Error:<nil>}
]
2023-09-02T21:58:56.828429586Z I0902 21:58:56.828400 1 status_controller.go:213] clusteroperator/etcd diff {"status":{"conditions":[{"lastTransitionTime":"2023-09-02T21:54:10Z","message":"NodeControllerDegraded: The master nodes not ready: node \"ci-op-j1vt7ci9-6d143-z86rp-master-1\" not ready since 2023-09-02 21:58:15 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)\nEtcdEndpointsDegraded: EtcdEndpointsController can't evaluate whether quorum is safe: etcd cluster has quorum of 2 and 2 healthy members which is not fault tolerant: [
{Member:ID:7970579734833654707 name:\"ci-op-j1vt7ci9-6d143-z86rp-master-1\" peerURLs:\"https://10.0.133.237:2380\" clientURLs:\"https://10.0.133.237:2379\" Healthy:false Took: Error:create client failure: failed to make etcd client for endpoints [https://10.0.133.237:2379]: context deadline exceeded}
{Member:ID:16371899399695993943 name:\"ci-op-j1vt7ci9-6d143-z86rp-master-0\" peerURLs:\"https://10.0.133.203:2380\" clientURLs:\"https://10.0.133.203:2379\" Healthy:true Took:810.982µs Error:\u003cnil\u003e}
]\nEtcdMembersDegraded: 2 of 3 members are available, ci-op-j1vt7ci9-6d143-z86rp-master-1 is unhealthy","reason":"AsExpected","status":"False","type":"Degraded"},
{"lastTransitionTime":"2023-09-02T21:02:02Z","message":"NodeInstallerProgressing: 3 nodes are at revision 9\nEtcdMembersProgressing: No unstarted etcd members found","reason":"AsExpected","status":"False","type":"Progressing"},
{"lastTransitionTime":"2023-09-02T19:36:12Z","message":"StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 9\nEtcdMembersAvailable: 2 of 3 members are available, ci-op-j1vt7ci9-6d143-z86rp-master-1 is unhealthy","reason":"AsExpected","status":"True","type":"Available"},
{"lastTransitionTime":"2023-09-02T19:34:31Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"},
{"lastTransitionTime":"2023-09-02T20:56:05Z","message":"UpgradeBackup for 4.13.11 is located at path /etc/kubernetes/cluster-backup/upgrade-backup-4.13.11-2023-09-02_205558 on node \"ci-op-j1vt7ci9-6d143-z86rp-master-0\"","reason":"UpgradeBackupSuccessful","status":"True","type":"RecentBackup"}]}}
2023-09-02T21:58:56.839027235Z I0902 21:58:56.838968 1 event.go:298] Event(v1.ObjectReference
): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/etcd changed: Degraded message changed from "NodeControllerDegraded: The master nodes not ready: node \"ci-op-j1vt7ci9-6d143-z86rp-master-1\" not ready since 2023-09-02 21:58:15 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)\nEtcdEndpointsDegraded: EtcdEndpointsController can't evaluate whether quorum is safe: etcd cluster has quorum of 2 and 2 healthy members which is not fault tolerant: [
{Member:ID:5365111498206899750 name:\"ci-op-j1vt7ci9-6d143-z86rp-master-2\" peerURLs:\"https://10.0.133.232:2380\" clientURLs:\"https://10.0.133.232:2379\" Healthy:true Took:1.065364ms Error:<nil>}{Member:ID:7970579734833654707 name:\"ci-op-j1vt7ci9-6d143-z86rp-master-1\" peerURLs:\"https://10.0.133.237:2380\" clientURLs:\"https://10.0.133.237:2379\" Healthy:false Took: Error:create client failure: failed to make etcd client for endpoints [https://10.0.133.237:2379]: context deadline exceeded}
{Member:ID:16371899399695993943 name:\"ci-op-j1vt7ci9-6d143-z86rp-master-0\" peerURLs:\"https://10.0.133.203:2380\" clientURLs:\"https://10.0.133.203:2379\" Healthy:true Took:831.307µs Error:<nil>}
]\nEtcdMembersDegraded: 2 of 3 members are available, ci-op-j1vt7ci9-6d143-z86rp-master-1 is unhealthy" to "NodeControllerDegraded: The master nodes not ready: node \"ci-op-j1vt7ci9-6d143-z86rp-master-1\" not ready since 2023-09-02 21:58:15 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)\nEtcdEndpointsDegraded: EtcdEndpointsController can't evaluate whether quorum is safe: etcd cluster has quorum of 2 and 2 healthy members which is not fault tolerant: [
{Member:ID:5365111498206899750 name:\"ci-op-j1vt7ci9-6d143-z86rp-master-2\" peerURLs:\"https://10.0.133.232:2380\" clientURLs:\"https://10.0.133.232:2379\" Healthy:true Took:1.048481ms Error:<nil>}{Member:ID:7970579734833654707 name:\"ci-op-j1vt7ci9-6d143-z86rp-master-1\" peerURLs:\"https://10.0.133.237:2380\" clientURLs:\"https://10.0.133.237:2379\" Healthy:false Took: Error:create client failure: failed to make etcd client for endpoints [https://10.0.133.237:2379]: context deadline exceeded}
{Member:ID:16371899399695993943 name:\"ci-op-j1vt7ci9-6d143-z86rp-master-0\" peerURLs:\"https://10.0.133.203:2380\" clientURLs:\"https://10.0.133.203:2379\" Healthy:true Took:810.982µs Error:<nil>}
]\nEtcdMembersDegraded: 2 of 3 members are available, ci-op-j1vt7ci9-6d143-z86rp-master-1 is unhealthy"
How reproducible:
Steps to Reproduce:
upgrade from 4.13 to 4.14,
Actual results:
upgrade failed with err:
2023-09-02T21:57:56.613937866Z E0902 21:57:56.613900 1 base_controller.go:268] EtcdCertSignerController reconciliation failed: EtcdCertSignerController can't evaluate whether quorum is safe: etcd cluster has quorum of 2 and 2 healthy members which is not fault tolerant: [
{Member:ID:7970579734833654707 name:"ci-op-j1vt7ci9-6d143-z86rp-master-1" peerURLs:"https://10.0.133.237:2380" clientURLs:"https://10.0.133.237:2379" Healthy:false Took: Error:create client failure: failed to make etcd client for endpoints [https://10.0.133.237:2379]: context deadline exceeded}
{Member:ID:16371899399695993943 name:"ci-op-j1vt7ci9-6d143-z86rp-master-0" peerURLs:"https://10.0.133.203:2380" clientURLs:"https://10.0.133.203:2379" Healthy:true Took:746.64µs Error:<nil>}
]
Expected results
upgrade succssfully
- links to
-
RHBA-2023:6837 OpenShift Container Platform 4.14.z bug fix update