Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-32403

CEO prevents member deletion during revision rollout

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • 4.16
    • Etcd
    • None
    • Moderate
    • Yes
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

          Its the same issue as https://issues.redhat.com/browse/OCPBUGS-17199 

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          always

      Steps to Reproduce:

          1.Create MHC for control plane
      
      apiVersion: machine.openshift.io/v1beta1
      kind: MachineHealthCheck
      metadata:
        name: control-plane-health
        namespace: openshift-machine-api
      spec:
        maxUnhealthy: 1
        selector:
          matchLabels:
            machine.openshift.io/cluster-api-machine-type: master
        unhealthyConditions:
        - status: "False"
          timeout: 300s
          type: Ready
        - status: "Unknown"
          timeout: 300s
          type: Ready     
      
      2. oc create -f <above mhc.yaml>
      oc get mhc
      NAME                              MAXUNHEALTHY   EXPECTEDMACHINES   CURRENTHEALTHY
      control-plane-health              1              3                  3
      machine-api-termination-handler   100%           3                  3
            
      3.Stop the kubelet service on the master node, new master get Running, the old one stuck in Deleting, many co degraded.  
      oc debug no/skundu-g3-hwnzk-master-0.us-central1-a.c.openshift-qe.internal
      W0418 12:37:38.271299   23817 warnings.go:70] metadata.name: this is used in the Pod's hostname, which can result in surprising behavior; a DNS label is recommended: [must be no more than 63 characters]
      Starting pod/skundu-g3-hwnzk-master-0us-central1-acopenshift-qeinternal-debug ...
      To use host binaries, run `chroot /host`
      Pod IP: 10.0.0.3
      If you don't see a command prompt, try pressing enter.
      sh-4.4# chroot /host
      sh-5.1# 
      sh-5.1# 
      sh-5.1# systemctl stop kubelet
      
      
      
      4. oc get machines
      NAME                             PHASE      TYPE            REGION        ZONE            AGE
      skundu-g3-hwnzk-master-0         Deleting   n2-standard-4   us-central1   us-central1-a   3h3m
      skundu-g3-hwnzk-master-1         Running    n2-standard-4   us-central1   us-central1-b   3h3m
      skundu-g3-hwnzk-master-2         Running    n2-standard-4   us-central1   us-central1-c   3h3m
      skundu-g3-hwnzk-master-b9dzr-0   Running    n2-standard-4   us-central1   us-central1-a   118m
      skundu-g3-hwnzk-worker-a-slw45   Running    n2-standard-4   us-central1   us-central1-a   175m
      skundu-g3-hwnzk-worker-b-7p2vr   Running    n2-standard-4   us-central1   us-central1-b   175m
      skundu-g3-hwnzk-worker-c-xs4ck   Running    n2-standard-4   us-central1   us-central1-c   175m
      
      -------------------------------------------------------------------------------
       oc get co
      NAME             VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication   4.16.0-0.nightly-2024-04-16-195622   True        True          True       159m    APIServerDeploymentDegraded: 1 of 4 requested instances are unavailable for apiserver.openshift-oauth-apiserver ()
      OAuthServerDeploymentDegraded: 1 of 4 requested instances are unavailable for oauth-openshift.openshift-authentication ()
      baremetal                   4.16.0-0.nightly-2024-04-16-195622   True        False         False      178m    
      cloud-controller-manager    4.16.0-0.nightly-2024-04-16-195622   True        False         False      3h2m    
      cloud-credential            4.16.0-0.nightly-2024-04-16-195622   True        False         False      3h5m    
      cluster-autoscaler          4.16.0-0.nightly-2024-04-16-195622   True        False         False      178m    
      config-operator             4.16.0-0.nightly-2024-04-16-195622   True        False         False      179m    
      console                     4.16.0-0.nightly-2024-04-16-195622   True        False         False      165m    
      control-plane-machine-set   4.16.0-0.nightly-2024-04-16-195622   True        False         False      116m    
      csi-snapshot-controller     4.16.0-0.nightly-2024-04-16-195622   True        False         False      169m    
      dns                         4.16.0-0.nightly-2024-04-16-195622   True        True          False      178m    DNS "default" reports Progressing=True: "Have 6 available node-resolver pods, want 7."
      etcd                        4.16.0-0.nightly-2024-04-16-195622   True        True          True       177m    NodeControllerDegraded: The master nodes not ready: node "skundu-g3-hwnzk-master-0.us-central1-a.c.openshift-qe.internal" not ready since 2024-04-18 07:11:06 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
      image-registry              4.16.0-0.nightly-2024-04-16-195622   True        True          False      168m    Progressing: The registry is ready
      NodeCADaemonProgressing: The daemon set node-ca is deploying node pods
      ingress                         4.16.0-0.nightly-2024-04-16-195622   True        False         False      169m    
      insights                        4.16.0-0.nightly-2024-04-16-195622   True        False         False      172m    
      kube-apiserver                  4.16.0-0.nightly-2024-04-16-195622   True        True          True       175m    NodeControllerDegraded: The master nodes not ready: node "skundu-g3-hwnzk-master-0.us-central1-a.c.openshift-qe.internal" not ready since 2024-04-18 07:11:06 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
      kube-controller-manager         4.16.0-0.nightly-2024-04-16-195622   True        False         True       177m    NodeControllerDegraded: The master nodes not ready: node "skundu-g3-hwnzk-master-0.us-central1-a.c.openshift-qe.internal" not ready since 2024-04-18 07:11:06 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
      kube-scheduler                  4.16.0-0.nightly-2024-04-16-195622   True        False         True       176m    NodeControllerDegraded: The master nodes not ready: node "skundu-g3-hwnzk-master-0.us-central1-a.c.openshift-qe.internal" not ready since 2024-04-18 07:11:06 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
      kube-storage-version-migrator   4.16.0-0.nightly-2024-04-16-195622   True        False         False      123m    
      machine-api                     4.16.0-0.nightly-2024-04-16-195622   True        False         False      172m    
      machine-approver                4.16.0-0.nightly-2024-04-16-195622   True        False         False      178m    
      machine-config                  4.16.0-0.nightly-2024-04-16-195622   True        False         True       178m    Failed to resync 4.16.0-0.nightly-2024-04-16-195622 because: error during waitForDaemonsetRollout: [context deadline exceeded, daemonset machine-config-daemon is not ready. status: (desired: 7, updated: 7, ready: 6, unavailable: 1)]
      marketplace                     4.16.0-0.nightly-2024-04-16-195622   True        False         False      178m    
      monitoring                      4.16.0-0.nightly-2024-04-16-195622   True        False         False      163m    
      network                         4.16.0-0.nightly-2024-04-16-195622   True        True          False      3h      DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes)
      DaemonSet "/openshift-network-node-identity/network-node-identity" is not available (awaiting 1 nodes)
      DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes)
      DaemonSet "/openshift-multus/multus" is not available (awaiting 1 nodes)
      DaemonSet "/openshift-multus/multus-additional-cni-plugins" is not available (awaiting 1 nodes)
      DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)
      node-tuning                                4.16.0-0.nightly-2024-04-16-195622   True        True          False      117m    Working towards "4.16.0-0.nightly-2024-04-16-195622"
      openshift-apiserver                        4.16.0-0.nightly-2024-04-16-195622   True        True          True       169m    APIServerDeploymentDegraded: 1 of 4 requested instances are unavailable for apiserver.openshift-apiserver ()
      openshift-controller-manager               4.16.0-0.nightly-2024-04-16-195622   True        False         False      169m    
      openshift-samples                          4.16.0-0.nightly-2024-04-16-195622   True        False         False      173m    
      operator-lifecycle-manager                 4.16.0-0.nightly-2024-04-16-195622   True        False         False      178m    
      operator-lifecycle-manager-catalog         4.16.0-0.nightly-2024-04-16-195622   True        False         False      178m    
      operator-lifecycle-manager-packageserver   4.16.0-0.nightly-2024-04-16-195622   True        False         False      169m    
      service-ca                                 4.16.0-0.nightly-2024-04-16-195622   True        False         False      179m    
      storage                                    4.16.0-0.nightly-2024-04-16-195622   True        True          False      179m    GCPPDCSIDriverOperatorCRProgressing: GCPPDDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods
      
      
      
         

      Actual results:

          

      Expected results:

          It should work fine. Old node should get replaced with the new one and cluster operators should be fine.

      Additional info:

          

      Attachments

        Issue Links

          Activity

            People

              tjungblu@redhat.com Thomas Jungblut
              rhn-support-skundu Sandeep Kundu
              ge liu ge liu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: