Details
-
Bug
-
Resolution: Duplicate
-
Undefined
-
None
-
4.12.z
-
False
-
Description
Description of problem:
While deploying 3510 SNOs via ACM and ZTP, 5 out of 19 install failures were because etcd operator reported to be in degraded state.
Version-Release number of selected component (if applicable):
Hub and SNO OCP 4.12.1 ACM 2.7.0-DOWNSTREAM-2023-01-26-20-15-10
How reproducible:
5 out of 19 install failures thus represents more than 25% of the install failures, however only represents 5 out of 3510 SNOs attempted to be installed (< .15% of all clusters installed)
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
# cat install_failed_etcd | xargs -I % sh -c "echo -n '% '; oc --kubeconfig=/root/hv-vm/sno/manifests/%/kubeconfig get clusterversion --no-headers" sno00389 version False False 4d Error while reconciling 4.12.1: the cluster operator etcd is degraded sno00540 version False False 3d23h Error while reconciling 4.12.1: the cluster operator etcd is degraded sno01227 version False False 3d21h Error while reconciling 4.12.1: the cluster operator etcd is degraded sno01544 version False False 3d20h Error while reconciling 4.12.1: the cluster operator etcd is degraded sno01958 version False False 3d21h Error while reconciling 4.12.1: the cluster operator etcd is degraded sno03301 version False False 3d18h Error while reconciling 4.12.1: the cluster operator etcd is degraded # cat install_failed_etcd | xargs -I % sh -c "echo -n '% '; oc --kubeconfig=/root/hv-vm/sno/manifests/%/kubeconfig get co etcd --no-headers" sno00389 etcd 4.12.1 True True True 4d MissingStaticPodControllerDegraded: static pod lifecycle failure - static pod: "etcd" in namespace: "openshift-etcd" for revision: 4 on node: "sno00389" didn't show up, waited: 3m30s sno00540 etcd 4.12.1 True True True 3d23h MissingStaticPodControllerDegraded: static pod lifecycle failure - static pod: "etcd" in namespace: "openshift-etcd" for revision: 3 on node: "sno00540" didn't show up, waited: 3m30s sno01227 etcd 4.12.1 True True True 3d22h MissingStaticPodControllerDegraded: static pod lifecycle failure - static pod: "etcd" in namespace: "openshift-etcd" for revision: 4 on node: "sno01227" didn't show up, waited: 3m30s sno01544 etcd 4.12.1 True True True 3d21h MissingStaticPodControllerDegraded: static pod lifecycle failure - static pod: "etcd" in namespace: "openshift-etcd" for revision: 3 on node: "sno01544" didn't show up, waited: 3m30s sno01958 etcd 4.12.1 True True True 3d21h MissingStaticPodControllerDegraded: static pod lifecycle failure - static pod: "etcd" in namespace: "openshift-etcd" for revision: 3 on node: "sno01958" didn't show up, waited: 3m30s sno03301 etcd 4.12.1 True True True 3d18h MissingStaticPodControllerDegraded: static pod lifecycle failure - static pod: "etcd" in namespace: "openshift-etcd" for revision: 3 on node: "sno03301" didn't show up, waited: 3m30s
And a describe on the operator from one of the affected SNOs:
# oc --kubeconfig /root/hv-vm/sno/manifests/sno00389/kubeconfig describe co etcd Name: etcd Namespace: Labels: <none> Annotations: exclude.release.openshift.io/internal-openshift-hosted: true include.release.openshift.io/self-managed-high-availability: true include.release.openshift.io/single-node-developer: true API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2023-01-27T20:55:29Z Generation: 1 Managed Fields: API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:exclude.release.openshift.io/internal-openshift-hosted: f:include.release.openshift.io/self-managed-high-availability: f:include.release.openshift.io/single-node-developer: f:ownerReferences: .: k:{"uid":"d0009f5d-f6f8-45f1-9f5d-c33493a90ac6"}: f:spec: Manager: cluster-version-operator Operation: Update Time: 2023-01-27T20:55:29Z API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:extension: f:relatedObjects: Manager: cluster-version-operator Operation: Update Subresource: status Time: 2023-01-27T20:55:30Z API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:status: f:conditions: f:versions: Manager: cluster-etcd-operator Operation: Update Subresource: status Time: 2023-01-28T16:09:58Z Owner References: API Version: config.openshift.io/v1 Kind: ClusterVersion Name: version UID: d0009f5d-f6f8-45f1-9f5d-c33493a90ac6 Resource Version: 239508 UID: e1b17331-afc0-4a50-9cd5-7b7fa6dc3b7b Spec: Status: Conditions: Last Transition Time: 2023-01-27T21:33:41Z Message: MissingStaticPodControllerDegraded: static pod lifecycle failure - static pod: "etcd" in namespace: "openshift-etcd" for revision: 4 on node: "sno00389" didn't show up, waited: 3m30s Reason: MissingStaticPodController_SyncError Status: True Type: Degraded Last Transition Time: 2023-01-27T21:27:21Z Message: NodeInstallerProgressing: 1 nodes are at revision 3; 0 nodes have achieved new revision 4 Reason: NodeInstaller Status: True Type: Progressing Last Transition Time: 2023-01-27T21:27:12Z Message: StaticPodsAvailable: 1 nodes are active; 1 nodes are at revision 3; 0 nodes have achieved new revision 4 EtcdMembersAvailable: 1 members are available Reason: AsExpected Status: True Type: Available Last Transition Time: 2023-01-27T21:24:10Z Message: All is well Reason: AsExpected Status: True Type: Upgradeable Last Transition Time: 2023-01-27T21:23:20Z Message: The etcd backup controller is starting, and will decide if recent backups are available or if a backup is required Reason: ControllerStarted Status: Unknown Type: RecentBackup Extension: <nil> Related Objects: Group: operator.openshift.io Name: cluster Resource: etcds Group: Name: openshift-config Resource: namespaces Group: Name: openshift-config-managed Resource: namespaces Group: Name: openshift-etcd-operator Resource: namespaces Group: Name: openshift-etcd Resource: namespaces Versions: Name: raw-internal Version: 4.12.1 Name: operator Version: 4.12.1 Name: etcd Version: 4.12.1 Events: <none>
Attachments
Issue Links
- relates to
-
OCPBUGS-3042 SNO etcd operator is degraded blocked cluster upgrade
- Closed