Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: 4.16
Component/s: Etcd
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
Yes

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

    Its the same issue as https://issues.redhat.com/browse/OCPBUGS-17199

Version-Release number of selected component (if applicable):

How reproducible:

    always

Steps to Reproduce:

    1.Create MHC for control plane

apiVersion: machine.openshift.io/v1beta1
kind: MachineHealthCheck
metadata:
  name: control-plane-health
  namespace: openshift-machine-api
spec:
  maxUnhealthy: 1
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-machine-type: master
  unhealthyConditions:
  - status: "False"
    timeout: 300s
    type: Ready
  - status: "Unknown"
    timeout: 300s
    type: Ready     

2. oc create -f <above mhc.yaml>
oc get mhc
NAME                              MAXUNHEALTHY   EXPECTEDMACHINES   CURRENTHEALTHY
control-plane-health              1              3                  3
machine-api-termination-handler   100%           3                  3
      
3.Stop the kubelet service on the master node, new master get Running, the old one stuck in Deleting, many co degraded.  
oc debug no/skundu-g3-hwnzk-master-0.us-central1-a.c.openshift-qe.internal
W0418 12:37:38.271299   23817 warnings.go:70] metadata.name: this is used in the Pod's hostname, which can result in surprising behavior; a DNS label is recommended: [must be no more than 63 characters]
Starting pod/skundu-g3-hwnzk-master-0us-central1-acopenshift-qeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.0.3
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-5.1# 
sh-5.1# 
sh-5.1# systemctl stop kubelet



4. oc get machines
NAME                             PHASE      TYPE            REGION        ZONE            AGE
skundu-g3-hwnzk-master-0         Deleting   n2-standard-4   us-central1   us-central1-a   3h3m
skundu-g3-hwnzk-master-1         Running    n2-standard-4   us-central1   us-central1-b   3h3m
skundu-g3-hwnzk-master-2         Running    n2-standard-4   us-central1   us-central1-c   3h3m
skundu-g3-hwnzk-master-b9dzr-0   Running    n2-standard-4   us-central1   us-central1-a   118m
skundu-g3-hwnzk-worker-a-slw45   Running    n2-standard-4   us-central1   us-central1-a   175m
skundu-g3-hwnzk-worker-b-7p2vr   Running    n2-standard-4   us-central1   us-central1-b   175m
skundu-g3-hwnzk-worker-c-xs4ck   Running    n2-standard-4   us-central1   us-central1-c   175m

-------------------------------------------------------------------------------
 oc get co
NAME             VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication   4.16.0-0.nightly-2024-04-16-195622   True        True          True       159m    APIServerDeploymentDegraded: 1 of 4 requested instances are unavailable for apiserver.openshift-oauth-apiserver ()
OAuthServerDeploymentDegraded: 1 of 4 requested instances are unavailable for oauth-openshift.openshift-authentication ()
baremetal                   4.16.0-0.nightly-2024-04-16-195622   True        False         False      178m    
cloud-controller-manager    4.16.0-0.nightly-2024-04-16-195622   True        False         False      3h2m    
cloud-credential            4.16.0-0.nightly-2024-04-16-195622   True        False         False      3h5m    
cluster-autoscaler          4.16.0-0.nightly-2024-04-16-195622   True        False         False      178m    
config-operator             4.16.0-0.nightly-2024-04-16-195622   True        False         False      179m    
console                     4.16.0-0.nightly-2024-04-16-195622   True        False         False      165m    
control-plane-machine-set   4.16.0-0.nightly-2024-04-16-195622   True        False         False      116m    
csi-snapshot-controller     4.16.0-0.nightly-2024-04-16-195622   True        False         False      169m    
dns                         4.16.0-0.nightly-2024-04-16-195622   True        True          False      178m    DNS "default" reports Progressing=True: "Have 6 available node-resolver pods, want 7."
etcd                        4.16.0-0.nightly-2024-04-16-195622   True        True          True       177m    NodeControllerDegraded: The master nodes not ready: node "skundu-g3-hwnzk-master-0.us-central1-a.c.openshift-qe.internal" not ready since 2024-04-18 07:11:06 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
image-registry              4.16.0-0.nightly-2024-04-16-195622   True        True          False      168m    Progressing: The registry is ready
NodeCADaemonProgressing: The daemon set node-ca is deploying node pods
ingress                         4.16.0-0.nightly-2024-04-16-195622   True        False         False      169m    
insights                        4.16.0-0.nightly-2024-04-16-195622   True        False         False      172m    
kube-apiserver                  4.16.0-0.nightly-2024-04-16-195622   True        True          True       175m    NodeControllerDegraded: The master nodes not ready: node "skundu-g3-hwnzk-master-0.us-central1-a.c.openshift-qe.internal" not ready since 2024-04-18 07:11:06 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
kube-controller-manager         4.16.0-0.nightly-2024-04-16-195622   True        False         True       177m    NodeControllerDegraded: The master nodes not ready: node "skundu-g3-hwnzk-master-0.us-central1-a.c.openshift-qe.internal" not ready since 2024-04-18 07:11:06 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
kube-scheduler                  4.16.0-0.nightly-2024-04-16-195622   True        False         True       176m    NodeControllerDegraded: The master nodes not ready: node "skundu-g3-hwnzk-master-0.us-central1-a.c.openshift-qe.internal" not ready since 2024-04-18 07:11:06 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
kube-storage-version-migrator   4.16.0-0.nightly-2024-04-16-195622   True        False         False      123m    
machine-api                     4.16.0-0.nightly-2024-04-16-195622   True        False         False      172m    
machine-approver                4.16.0-0.nightly-2024-04-16-195622   True        False         False      178m    
machine-config                  4.16.0-0.nightly-2024-04-16-195622   True        False         True       178m    Failed to resync 4.16.0-0.nightly-2024-04-16-195622 because: error during waitForDaemonsetRollout: [context deadline exceeded, daemonset machine-config-daemon is not ready. status: (desired: 7, updated: 7, ready: 6, unavailable: 1)]
marketplace                     4.16.0-0.nightly-2024-04-16-195622   True        False         False      178m    
monitoring                      4.16.0-0.nightly-2024-04-16-195622   True        False         False      163m    
network                         4.16.0-0.nightly-2024-04-16-195622   True        True          False      3h      DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes)
DaemonSet "/openshift-network-node-identity/network-node-identity" is not available (awaiting 1 nodes)
DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes)
DaemonSet "/openshift-multus/multus" is not available (awaiting 1 nodes)
DaemonSet "/openshift-multus/multus-additional-cni-plugins" is not available (awaiting 1 nodes)
DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)
node-tuning                                4.16.0-0.nightly-2024-04-16-195622   True        True          False      117m    Working towards "4.16.0-0.nightly-2024-04-16-195622"
openshift-apiserver                        4.16.0-0.nightly-2024-04-16-195622   True        True          True       169m    APIServerDeploymentDegraded: 1 of 4 requested instances are unavailable for apiserver.openshift-apiserver ()
openshift-controller-manager               4.16.0-0.nightly-2024-04-16-195622   True        False         False      169m    
openshift-samples                          4.16.0-0.nightly-2024-04-16-195622   True        False         False      173m    
operator-lifecycle-manager                 4.16.0-0.nightly-2024-04-16-195622   True        False         False      178m    
operator-lifecycle-manager-catalog         4.16.0-0.nightly-2024-04-16-195622   True        False         False      178m    
operator-lifecycle-manager-packageserver   4.16.0-0.nightly-2024-04-16-195622   True        False         False      169m    
service-ca                                 4.16.0-0.nightly-2024-04-16-195622   True        False         False      179m    
storage                                    4.16.0-0.nightly-2024-04-16-195622   True        True          False      179m    GCPPDCSIDriverOperatorCRProgressing: GCPPDDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods

Actual results:

Expected results:

    It should work fine. Old node should get replaced with the new one and cluster operators should be fine.

Additional info:

is duplicated by

OCPBUGS-17199 CEO prevents member deletion during revision rollout

Closed

Assignee:: Thomas Jungblut

Reporter:: Sandeep Kundu

Need Info From:: None

Contributors:: None

QA Contact:: Ge Liu

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/04/18 9:18 AM

Updated:: 2025/07/23 5:31 AM

Resolved:: 2024/04/30 9:08 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates