-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
4.16
-
None
-
Moderate
-
Yes
-
Rejected
-
False
-
Description of problem:
Its the same issue as https://issues.redhat.com/browse/OCPBUGS-17199
Version-Release number of selected component (if applicable):
How reproducible:
always
Steps to Reproduce:
1.Create MHC for control plane apiVersion: machine.openshift.io/v1beta1 kind: MachineHealthCheck metadata: name: control-plane-health namespace: openshift-machine-api spec: maxUnhealthy: 1 selector: matchLabels: machine.openshift.io/cluster-api-machine-type: master unhealthyConditions: - status: "False" timeout: 300s type: Ready - status: "Unknown" timeout: 300s type: Ready 2. oc create -f <above mhc.yaml> oc get mhc NAME MAXUNHEALTHY EXPECTEDMACHINES CURRENTHEALTHY control-plane-health 1 3 3 machine-api-termination-handler 100% 3 3 3.Stop the kubelet service on the master node, new master get Running, the old one stuck in Deleting, many co degraded. oc debug no/skundu-g3-hwnzk-master-0.us-central1-a.c.openshift-qe.internal W0418 12:37:38.271299 23817 warnings.go:70] metadata.name: this is used in the Pod's hostname, which can result in surprising behavior; a DNS label is recommended: [must be no more than 63 characters] Starting pod/skundu-g3-hwnzk-master-0us-central1-acopenshift-qeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.0.3 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-5.1# sh-5.1# sh-5.1# systemctl stop kubelet 4. oc get machines NAME PHASE TYPE REGION ZONE AGE skundu-g3-hwnzk-master-0 Deleting n2-standard-4 us-central1 us-central1-a 3h3m skundu-g3-hwnzk-master-1 Running n2-standard-4 us-central1 us-central1-b 3h3m skundu-g3-hwnzk-master-2 Running n2-standard-4 us-central1 us-central1-c 3h3m skundu-g3-hwnzk-master-b9dzr-0 Running n2-standard-4 us-central1 us-central1-a 118m skundu-g3-hwnzk-worker-a-slw45 Running n2-standard-4 us-central1 us-central1-a 175m skundu-g3-hwnzk-worker-b-7p2vr Running n2-standard-4 us-central1 us-central1-b 175m skundu-g3-hwnzk-worker-c-xs4ck Running n2-standard-4 us-central1 us-central1-c 175m ------------------------------------------------------------------------------- oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.16.0-0.nightly-2024-04-16-195622 True True True 159m APIServerDeploymentDegraded: 1 of 4 requested instances are unavailable for apiserver.openshift-oauth-apiserver () OAuthServerDeploymentDegraded: 1 of 4 requested instances are unavailable for oauth-openshift.openshift-authentication () baremetal 4.16.0-0.nightly-2024-04-16-195622 True False False 178m cloud-controller-manager 4.16.0-0.nightly-2024-04-16-195622 True False False 3h2m cloud-credential 4.16.0-0.nightly-2024-04-16-195622 True False False 3h5m cluster-autoscaler 4.16.0-0.nightly-2024-04-16-195622 True False False 178m config-operator 4.16.0-0.nightly-2024-04-16-195622 True False False 179m console 4.16.0-0.nightly-2024-04-16-195622 True False False 165m control-plane-machine-set 4.16.0-0.nightly-2024-04-16-195622 True False False 116m csi-snapshot-controller 4.16.0-0.nightly-2024-04-16-195622 True False False 169m dns 4.16.0-0.nightly-2024-04-16-195622 True True False 178m DNS "default" reports Progressing=True: "Have 6 available node-resolver pods, want 7." etcd 4.16.0-0.nightly-2024-04-16-195622 True True True 177m NodeControllerDegraded: The master nodes not ready: node "skundu-g3-hwnzk-master-0.us-central1-a.c.openshift-qe.internal" not ready since 2024-04-18 07:11:06 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.) image-registry 4.16.0-0.nightly-2024-04-16-195622 True True False 168m Progressing: The registry is ready NodeCADaemonProgressing: The daemon set node-ca is deploying node pods ingress 4.16.0-0.nightly-2024-04-16-195622 True False False 169m insights 4.16.0-0.nightly-2024-04-16-195622 True False False 172m kube-apiserver 4.16.0-0.nightly-2024-04-16-195622 True True True 175m NodeControllerDegraded: The master nodes not ready: node "skundu-g3-hwnzk-master-0.us-central1-a.c.openshift-qe.internal" not ready since 2024-04-18 07:11:06 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.) kube-controller-manager 4.16.0-0.nightly-2024-04-16-195622 True False True 177m NodeControllerDegraded: The master nodes not ready: node "skundu-g3-hwnzk-master-0.us-central1-a.c.openshift-qe.internal" not ready since 2024-04-18 07:11:06 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.) kube-scheduler 4.16.0-0.nightly-2024-04-16-195622 True False True 176m NodeControllerDegraded: The master nodes not ready: node "skundu-g3-hwnzk-master-0.us-central1-a.c.openshift-qe.internal" not ready since 2024-04-18 07:11:06 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.) kube-storage-version-migrator 4.16.0-0.nightly-2024-04-16-195622 True False False 123m machine-api 4.16.0-0.nightly-2024-04-16-195622 True False False 172m machine-approver 4.16.0-0.nightly-2024-04-16-195622 True False False 178m machine-config 4.16.0-0.nightly-2024-04-16-195622 True False True 178m Failed to resync 4.16.0-0.nightly-2024-04-16-195622 because: error during waitForDaemonsetRollout: [context deadline exceeded, daemonset machine-config-daemon is not ready. status: (desired: 7, updated: 7, ready: 6, unavailable: 1)] marketplace 4.16.0-0.nightly-2024-04-16-195622 True False False 178m monitoring 4.16.0-0.nightly-2024-04-16-195622 True False False 163m network 4.16.0-0.nightly-2024-04-16-195622 True True False 3h DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes) DaemonSet "/openshift-network-node-identity/network-node-identity" is not available (awaiting 1 nodes) DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes) DaemonSet "/openshift-multus/multus" is not available (awaiting 1 nodes) DaemonSet "/openshift-multus/multus-additional-cni-plugins" is not available (awaiting 1 nodes) DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes) node-tuning 4.16.0-0.nightly-2024-04-16-195622 True True False 117m Working towards "4.16.0-0.nightly-2024-04-16-195622" openshift-apiserver 4.16.0-0.nightly-2024-04-16-195622 True True True 169m APIServerDeploymentDegraded: 1 of 4 requested instances are unavailable for apiserver.openshift-apiserver () openshift-controller-manager 4.16.0-0.nightly-2024-04-16-195622 True False False 169m openshift-samples 4.16.0-0.nightly-2024-04-16-195622 True False False 173m operator-lifecycle-manager 4.16.0-0.nightly-2024-04-16-195622 True False False 178m operator-lifecycle-manager-catalog 4.16.0-0.nightly-2024-04-16-195622 True False False 178m operator-lifecycle-manager-packageserver 4.16.0-0.nightly-2024-04-16-195622 True False False 169m service-ca 4.16.0-0.nightly-2024-04-16-195622 True False False 179m storage 4.16.0-0.nightly-2024-04-16-195622 True True False 179m GCPPDCSIDriverOperatorCRProgressing: GCPPDDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods
Actual results:
Expected results:
It should work fine. Old node should get replaced with the new one and cluster operators should be fine.
Additional info:
- is duplicated by
-
OCPBUGS-17199 CEO prevents member deletion during revision rollout
- New