-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
4.16
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
Yes
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Its the same issue as https://issues.redhat.com/browse/OCPBUGS-17199
Version-Release number of selected component (if applicable):
How reproducible:
always
Steps to Reproduce:
1.Create MHC for control plane
apiVersion: machine.openshift.io/v1beta1
kind: MachineHealthCheck
metadata:
name: control-plane-health
namespace: openshift-machine-api
spec:
maxUnhealthy: 1
selector:
matchLabels:
machine.openshift.io/cluster-api-machine-type: master
unhealthyConditions:
- status: "False"
timeout: 300s
type: Ready
- status: "Unknown"
timeout: 300s
type: Ready
2. oc create -f <above mhc.yaml>
oc get mhc
NAME MAXUNHEALTHY EXPECTEDMACHINES CURRENTHEALTHY
control-plane-health 1 3 3
machine-api-termination-handler 100% 3 3
3.Stop the kubelet service on the master node, new master get Running, the old one stuck in Deleting, many co degraded.
oc debug no/skundu-g3-hwnzk-master-0.us-central1-a.c.openshift-qe.internal
W0418 12:37:38.271299 23817 warnings.go:70] metadata.name: this is used in the Pod's hostname, which can result in surprising behavior; a DNS label is recommended: [must be no more than 63 characters]
Starting pod/skundu-g3-hwnzk-master-0us-central1-acopenshift-qeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.0.3
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-5.1#
sh-5.1#
sh-5.1# systemctl stop kubelet
4. oc get machines
NAME PHASE TYPE REGION ZONE AGE
skundu-g3-hwnzk-master-0 Deleting n2-standard-4 us-central1 us-central1-a 3h3m
skundu-g3-hwnzk-master-1 Running n2-standard-4 us-central1 us-central1-b 3h3m
skundu-g3-hwnzk-master-2 Running n2-standard-4 us-central1 us-central1-c 3h3m
skundu-g3-hwnzk-master-b9dzr-0 Running n2-standard-4 us-central1 us-central1-a 118m
skundu-g3-hwnzk-worker-a-slw45 Running n2-standard-4 us-central1 us-central1-a 175m
skundu-g3-hwnzk-worker-b-7p2vr Running n2-standard-4 us-central1 us-central1-b 175m
skundu-g3-hwnzk-worker-c-xs4ck Running n2-standard-4 us-central1 us-central1-c 175m
-------------------------------------------------------------------------------
oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.16.0-0.nightly-2024-04-16-195622 True True True 159m APIServerDeploymentDegraded: 1 of 4 requested instances are unavailable for apiserver.openshift-oauth-apiserver ()
OAuthServerDeploymentDegraded: 1 of 4 requested instances are unavailable for oauth-openshift.openshift-authentication ()
baremetal 4.16.0-0.nightly-2024-04-16-195622 True False False 178m
cloud-controller-manager 4.16.0-0.nightly-2024-04-16-195622 True False False 3h2m
cloud-credential 4.16.0-0.nightly-2024-04-16-195622 True False False 3h5m
cluster-autoscaler 4.16.0-0.nightly-2024-04-16-195622 True False False 178m
config-operator 4.16.0-0.nightly-2024-04-16-195622 True False False 179m
console 4.16.0-0.nightly-2024-04-16-195622 True False False 165m
control-plane-machine-set 4.16.0-0.nightly-2024-04-16-195622 True False False 116m
csi-snapshot-controller 4.16.0-0.nightly-2024-04-16-195622 True False False 169m
dns 4.16.0-0.nightly-2024-04-16-195622 True True False 178m DNS "default" reports Progressing=True: "Have 6 available node-resolver pods, want 7."
etcd 4.16.0-0.nightly-2024-04-16-195622 True True True 177m NodeControllerDegraded: The master nodes not ready: node "skundu-g3-hwnzk-master-0.us-central1-a.c.openshift-qe.internal" not ready since 2024-04-18 07:11:06 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
image-registry 4.16.0-0.nightly-2024-04-16-195622 True True False 168m Progressing: The registry is ready
NodeCADaemonProgressing: The daemon set node-ca is deploying node pods
ingress 4.16.0-0.nightly-2024-04-16-195622 True False False 169m
insights 4.16.0-0.nightly-2024-04-16-195622 True False False 172m
kube-apiserver 4.16.0-0.nightly-2024-04-16-195622 True True True 175m NodeControllerDegraded: The master nodes not ready: node "skundu-g3-hwnzk-master-0.us-central1-a.c.openshift-qe.internal" not ready since 2024-04-18 07:11:06 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
kube-controller-manager 4.16.0-0.nightly-2024-04-16-195622 True False True 177m NodeControllerDegraded: The master nodes not ready: node "skundu-g3-hwnzk-master-0.us-central1-a.c.openshift-qe.internal" not ready since 2024-04-18 07:11:06 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
kube-scheduler 4.16.0-0.nightly-2024-04-16-195622 True False True 176m NodeControllerDegraded: The master nodes not ready: node "skundu-g3-hwnzk-master-0.us-central1-a.c.openshift-qe.internal" not ready since 2024-04-18 07:11:06 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
kube-storage-version-migrator 4.16.0-0.nightly-2024-04-16-195622 True False False 123m
machine-api 4.16.0-0.nightly-2024-04-16-195622 True False False 172m
machine-approver 4.16.0-0.nightly-2024-04-16-195622 True False False 178m
machine-config 4.16.0-0.nightly-2024-04-16-195622 True False True 178m Failed to resync 4.16.0-0.nightly-2024-04-16-195622 because: error during waitForDaemonsetRollout: [context deadline exceeded, daemonset machine-config-daemon is not ready. status: (desired: 7, updated: 7, ready: 6, unavailable: 1)]
marketplace 4.16.0-0.nightly-2024-04-16-195622 True False False 178m
monitoring 4.16.0-0.nightly-2024-04-16-195622 True False False 163m
network 4.16.0-0.nightly-2024-04-16-195622 True True False 3h DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes)
DaemonSet "/openshift-network-node-identity/network-node-identity" is not available (awaiting 1 nodes)
DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes)
DaemonSet "/openshift-multus/multus" is not available (awaiting 1 nodes)
DaemonSet "/openshift-multus/multus-additional-cni-plugins" is not available (awaiting 1 nodes)
DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)
node-tuning 4.16.0-0.nightly-2024-04-16-195622 True True False 117m Working towards "4.16.0-0.nightly-2024-04-16-195622"
openshift-apiserver 4.16.0-0.nightly-2024-04-16-195622 True True True 169m APIServerDeploymentDegraded: 1 of 4 requested instances are unavailable for apiserver.openshift-apiserver ()
openshift-controller-manager 4.16.0-0.nightly-2024-04-16-195622 True False False 169m
openshift-samples 4.16.0-0.nightly-2024-04-16-195622 True False False 173m
operator-lifecycle-manager 4.16.0-0.nightly-2024-04-16-195622 True False False 178m
operator-lifecycle-manager-catalog 4.16.0-0.nightly-2024-04-16-195622 True False False 178m
operator-lifecycle-manager-packageserver 4.16.0-0.nightly-2024-04-16-195622 True False False 169m
service-ca 4.16.0-0.nightly-2024-04-16-195622 True False False 179m
storage 4.16.0-0.nightly-2024-04-16-195622 True True False 179m GCPPDCSIDriverOperatorCRProgressing: GCPPDDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods
Actual results:
Expected results:
It should work fine. Old node should get replaced with the new one and cluster operators should be fine.
Additional info:
- is duplicated by
-
OCPBUGS-17199 CEO prevents member deletion during revision rollout
-
- Closed
-