Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-14673

MHC for control plane cannot work right for some cases

XMLWordPrintable

    • Important
    • Yes
    • CLOUD Sprint 239
    • 1
    • Rejected
    • False
    • Hide

      This is a regression in behaviour that shipped in a previous release so has been broken for some time

      Show
      This is a regression in behaviour that shipped in a previous release so has been broken for some time

      Description of problem:

      MHC for control plane cannot work right for some cases
      Tried three cases:
      1.Terminate/Delete a master on the cloud provider console, no issue
      2.Stop the kubelet service on the master node, new master get Running, the old one stuck in Deleting, many co degraded.
      3.Delete a master node, the old machine stuck in Deleting, no new machine created.
      
      This is a regression bug, because I tested this on 4.12 around September 2022, case 2 and case 3 work right.
      https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-54326

      Version-Release number of selected component (if applicable):

      4.14.0-0.nightly-2023-06-05-112833
      4.13.0-0.nightly-2023-06-06-194351
      4.12.0-0.nightly-2023-06-07-005319

      How reproducible:

      Always

      Steps to Reproduce:

      1.Create MHC for control plane
      
      apiVersion: machine.openshift.io/v1beta1
      kind: MachineHealthCheck
      metadata:
        name: control-plane-health
        namespace: openshift-machine-api
      spec:
        maxUnhealthy: 1
        selector:
          matchLabels:
            machine.openshift.io/cluster-api-machine-type: master
        unhealthyConditions:
        - status: "False"
          timeout: 300s
          type: Ready
        - status: "Unknown"
          timeout: 300s
          type: Ready
      
      
      liuhuali@Lius-MacBook-Pro huali-test % oc create -f mhc-master3.yaml 
      machinehealthcheck.machine.openshift.io/control-plane-health created
      liuhuali@Lius-MacBook-Pro huali-test % oc get mhc
      NAME                              MAXUNHEALTHY   EXPECTEDMACHINES   CURRENTHEALTHY
      control-plane-health              1              3                  3
      machine-api-termination-handler   100%           0                  0 
      
      Case 1: Terminate/Delete a master on the cloud provider console, no issue.
      liuhuali@Lius-MacBook-Pro huali-test % oc get node
      NAME                                   STATUS   ROLES                  AGE     VERSION
      huliu-az7c-svq9q-master-0              Ready    control-plane,master   70m     v1.26.5+7a891f0
      huliu-az7c-svq9q-master-1              Ready    control-plane,master   69m     v1.26.5+7a891f0
      huliu-az7c-svq9q-master-2              Ready    control-plane,master   69m     v1.26.5+7a891f0
      huliu-az7c-svq9q-worker-westus-5r8jf   Ready    worker                 8m48s   v1.26.5+7a891f0
      huliu-az7c-svq9q-worker-westus-k747l   Ready    worker                 21m     v1.26.5+7a891f0
      huliu-az7c-svq9q-worker-westus-r2vdn   Ready    worker                 57m     v1.26.5+7a891f0
      liuhuali@Lius-MacBook-Pro huali-test % oc get node
      NAME                                   STATUS   ROLES                  AGE     VERSION
      huliu-az7c-svq9q-master-1              Ready    control-plane,master   70m     v1.26.5+7a891f0
      huliu-az7c-svq9q-master-2              Ready    control-plane,master   70m     v1.26.5+7a891f0
      huliu-az7c-svq9q-worker-westus-5r8jf   Ready    worker                 9m28s   v1.26.5+7a891f0
      huliu-az7c-svq9q-worker-westus-k747l   Ready    worker                 22m     v1.26.5+7a891f0
      huliu-az7c-svq9q-worker-westus-r2vdn   Ready    worker                 58m     v1.26.5+7a891f0
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine
      NAME                                   PHASE      TYPE              REGION   ZONE   AGE
      huliu-az7c-svq9q-master-0              Deleting   Standard_D8s_v3   westus          72m
      huliu-az7c-svq9q-master-1              Running    Standard_D8s_v3   westus          72m
      huliu-az7c-svq9q-master-2              Running    Standard_D8s_v3   westus          72m
      huliu-az7c-svq9q-worker-westus-5r8jf   Running    Standard_D4s_v3   westus          15m
      huliu-az7c-svq9q-worker-westus-k747l   Running    Standard_D4s_v3   westus          28m
      huliu-az7c-svq9q-worker-westus-r2vdn   Running    Standard_D4s_v3   westus          66m
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine
      NAME                                   PHASE          TYPE              REGION   ZONE   AGE
      huliu-az7c-svq9q-master-1              Running        Standard_D8s_v3   westus          74m
      huliu-az7c-svq9q-master-2              Running        Standard_D8s_v3   westus          74m
      huliu-az7c-svq9q-master-c96k8-0        Provisioning                                     2s
      huliu-az7c-svq9q-worker-westus-5r8jf   Running        Standard_D4s_v3   westus          16m
      huliu-az7c-svq9q-worker-westus-k747l   Running        Standard_D4s_v3   westus          29m
      huliu-az7c-svq9q-worker-westus-r2vdn   Running        Standard_D4s_v3   westus          67m
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine
      NAME                                   PHASE     TYPE              REGION   ZONE   AGE
      huliu-az7c-svq9q-master-1              Running   Standard_D8s_v3   westus          95m
      huliu-az7c-svq9q-master-2              Running   Standard_D8s_v3   westus          95m
      huliu-az7c-svq9q-master-c96k8-0        Running   Standard_D8s_v3   westus          21m
      huliu-az7c-svq9q-worker-westus-5r8jf   Running   Standard_D4s_v3   westus          37m
      huliu-az7c-svq9q-worker-westus-k747l   Running   Standard_D4s_v3   westus          50m
      huliu-az7c-svq9q-worker-westus-r2vdn   Running   Standard_D4s_v3   westus          88m
      liuhuali@Lius-MacBook-Pro huali-test % oc get node
      NAME                                   STATUS   ROLES                  AGE   VERSION
      huliu-az7c-svq9q-master-1              Ready    control-plane,master   93m   v1.26.5+7a891f0
      huliu-az7c-svq9q-master-2              Ready    control-plane,master   93m   v1.26.5+7a891f0
      huliu-az7c-svq9q-master-c96k8-0        Ready    control-plane,master   17m   v1.26.5+7a891f0
      huliu-az7c-svq9q-worker-westus-5r8jf   Ready    worker                 32m   v1.26.5+7a891f0
      huliu-az7c-svq9q-worker-westus-k747l   Ready    worker                 44m   v1.26.5+7a891f0
      huliu-az7c-svq9q-worker-westus-r2vdn   Ready    worker                 80m   v1.26.5+7a891f0
      
      Case 2.Stop the kubelet service on the master node, new master get Running, the old one stuck in Deleting, many co degraded.
      liuhuali@Lius-MacBook-Pro huali-test % oc debug node/huliu-az7c-svq9q-master-1 
      Starting pod/huliu-az7c-svq9q-master-1-debug ...
      To use host binaries, run `chroot /host`
      Pod IP: 10.0.0.6
      If you don't see a command prompt, try pressing enter.
      sh-4.4# chroot /host
      sh-5.1# systemctl stop kubelet
      
      
      Removing debug pod ...
      liuhuali@Lius-MacBook-Pro huali-test % oc get node
      NAME                                   STATUS   ROLES                  AGE   VERSION
      huliu-az7c-svq9q-master-1              Ready    control-plane,master   95m   v1.26.5+7a891f0
      huliu-az7c-svq9q-master-2              Ready    control-plane,master   95m   v1.26.5+7a891f0
      huliu-az7c-svq9q-master-c96k8-0        Ready    control-plane,master   19m   v1.26.5+7a891f0
      huliu-az7c-svq9q-worker-westus-5r8jf   Ready    worker                 34m   v1.26.5+7a891f0
      huliu-az7c-svq9q-worker-westus-k747l   Ready    worker                 47m   v1.26.5+7a891f0
      huliu-az7c-svq9q-worker-westus-r2vdn   Ready    worker                 83m   v1.26.5+7a891f0
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine
      NAME                                   PHASE     TYPE              REGION   ZONE   AGE
      huliu-az7c-svq9q-master-1              Running   Standard_D8s_v3   westus          97m
      huliu-az7c-svq9q-master-2              Running   Standard_D8s_v3   westus          97m
      huliu-az7c-svq9q-master-c96k8-0        Running   Standard_D8s_v3   westus          23m
      huliu-az7c-svq9q-worker-westus-5r8jf   Running   Standard_D4s_v3   westus          39m
      huliu-az7c-svq9q-worker-westus-k747l   Running   Standard_D4s_v3   westus          53m
      huliu-az7c-svq9q-worker-westus-r2vdn   Running   Standard_D4s_v3   westus          91m
      liuhuali@Lius-MacBook-Pro huali-test % oc get node
      NAME                                   STATUS     ROLES                  AGE     VERSION
      huliu-az7c-svq9q-master-1              NotReady   control-plane,master   107m    v1.26.5+7a891f0
      huliu-az7c-svq9q-master-2              Ready      control-plane,master   107m    v1.26.5+7a891f0
      huliu-az7c-svq9q-master-c96k8-0        Ready      control-plane,master   32m     v1.26.5+7a891f0
      huliu-az7c-svq9q-master-jdhgg-1        Ready      control-plane,master   2m10s   v1.26.5+7a891f0
      huliu-az7c-svq9q-worker-westus-5r8jf   Ready      worker                 46m     v1.26.5+7a891f0
      huliu-az7c-svq9q-worker-westus-k747l   Ready      worker                 59m     v1.26.5+7a891f0
      huliu-az7c-svq9q-worker-westus-r2vdn   Ready      worker                 95m     v1.26.5+7a891f0
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine
      NAME                                   PHASE      TYPE              REGION   ZONE   AGE
      huliu-az7c-svq9q-master-1              Deleting   Standard_D8s_v3   westus          110m
      huliu-az7c-svq9q-master-2              Running    Standard_D8s_v3   westus          110m
      huliu-az7c-svq9q-master-c96k8-0        Running    Standard_D8s_v3   westus          36m
      huliu-az7c-svq9q-master-jdhgg-1        Running    Standard_D8s_v3   westus          5m55s
      huliu-az7c-svq9q-worker-westus-5r8jf   Running    Standard_D4s_v3   westus          52m
      huliu-az7c-svq9q-worker-westus-k747l   Running    Standard_D4s_v3   westus          65m
      huliu-az7c-svq9q-worker-westus-r2vdn   Running    Standard_D4s_v3   westus          103m
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine
      NAME                                   PHASE      TYPE              REGION   ZONE   AGE
      huliu-az7c-svq9q-master-1              Deleting   Standard_D8s_v3   westus          3h
      huliu-az7c-svq9q-master-2              Running    Standard_D8s_v3   westus          3h
      huliu-az7c-svq9q-master-c96k8-0        Running    Standard_D8s_v3   westus          105m
      huliu-az7c-svq9q-master-jdhgg-1        Running    Standard_D8s_v3   westus          75m
      huliu-az7c-svq9q-worker-westus-5r8jf   Running    Standard_D4s_v3   westus          122m
      huliu-az7c-svq9q-worker-westus-k747l   Running    Standard_D4s_v3   westus          135m
      huliu-az7c-svq9q-worker-westus-r2vdn   Running    Standard_D4s_v3   westus          173m
      liuhuali@Lius-MacBook-Pro huali-test % oc get node   
      NAME                                   STATUS     ROLES                  AGE    VERSION
      huliu-az7c-svq9q-master-1              NotReady   control-plane,master   178m   v1.26.5+7a891f0
      huliu-az7c-svq9q-master-2              Ready      control-plane,master   178m   v1.26.5+7a891f0
      huliu-az7c-svq9q-master-c96k8-0        Ready      control-plane,master   102m   v1.26.5+7a891f0
      huliu-az7c-svq9q-master-jdhgg-1        Ready      control-plane,master   72m    v1.26.5+7a891f0
      huliu-az7c-svq9q-worker-westus-5r8jf   Ready      worker                 116m   v1.26.5+7a891f0
      huliu-az7c-svq9q-worker-westus-k747l   Ready      worker                 129m   v1.26.5+7a891f0
      huliu-az7c-svq9q-worker-westus-r2vdn   Ready      worker                 165m   v1.26.5+7a891f0
      liuhuali@Lius-MacBook-Pro huali-test % oc get co
      NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.13.0-0.nightly-2023-06-06-194351   True        True          True       107m    APIServerDeploymentDegraded: 1 of 4 requested instances are unavailable for apiserver.openshift-oauth-apiserver ()...
      baremetal                                  4.13.0-0.nightly-2023-06-06-194351   True        False         False      174m    
      cloud-controller-manager                   4.13.0-0.nightly-2023-06-06-194351   True        False         False      176m    
      cloud-credential                           4.13.0-0.nightly-2023-06-06-194351   True        False         False      3h      
      cluster-autoscaler                         4.13.0-0.nightly-2023-06-06-194351   True        False         False      173m    
      config-operator                            4.13.0-0.nightly-2023-06-06-194351   True        False         False      175m    
      console                                    4.13.0-0.nightly-2023-06-06-194351   True        False         False      136m    
      control-plane-machine-set                  4.13.0-0.nightly-2023-06-06-194351   True        False         False      71m     
      csi-snapshot-controller                    4.13.0-0.nightly-2023-06-06-194351   True        False         False      174m    
      dns                                        4.13.0-0.nightly-2023-06-06-194351   True        True          False      173m    DNS "default" reports Progressing=True: "Have 6 available node-resolver pods, want 7."
      etcd                                       4.13.0-0.nightly-2023-06-06-194351   True        True          True       173m    NodeControllerDegraded: The master nodes not ready: node "huliu-az7c-svq9q-master-1" not ready since 2023-06-07 08:47:34 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
      image-registry                             4.13.0-0.nightly-2023-06-06-194351   True        True          False      165m    Progressing: The registry is ready...
      ingress                                    4.13.0-0.nightly-2023-06-06-194351   True        False         False      165m    
      insights                                   4.13.0-0.nightly-2023-06-06-194351   True        False         False      168m    
      kube-apiserver                             4.13.0-0.nightly-2023-06-06-194351   True        True          True       171m    NodeControllerDegraded: The master nodes not ready: node "huliu-az7c-svq9q-master-1" not ready since 2023-06-07 08:47:34 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
      kube-controller-manager                    4.13.0-0.nightly-2023-06-06-194351   True        False         True       171m    NodeControllerDegraded: The master nodes not ready: node "huliu-az7c-svq9q-master-1" not ready since 2023-06-07 08:47:34 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
      kube-scheduler                             4.13.0-0.nightly-2023-06-06-194351   True        False         True       171m    NodeControllerDegraded: The master nodes not ready: node "huliu-az7c-svq9q-master-1" not ready since 2023-06-07 08:47:34 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
      kube-storage-version-migrator              4.13.0-0.nightly-2023-06-06-194351   True        False         False      106m    
      machine-api                                4.13.0-0.nightly-2023-06-06-194351   True        False         False      167m    
      machine-approver                           4.13.0-0.nightly-2023-06-06-194351   True        False         False      174m    
      machine-config                             4.13.0-0.nightly-2023-06-06-194351   False       False         True       60m     Cluster not available for [{operator 4.13.0-0.nightly-2023-06-06-194351}]: failed to apply machine config daemon manifests: error during waitForDaemonsetRollout: [timed out waiting for the condition, daemonset machine-config-daemon is not ready. status: (desired: 7, updated: 7, ready: 6, unavailable: 1)]
      marketplace                                4.13.0-0.nightly-2023-06-06-194351   True        False         False      174m    
      monitoring                                 4.13.0-0.nightly-2023-06-06-194351   True        False         False      106m    
      network                                    4.13.0-0.nightly-2023-06-06-194351   True        True          False      177m    DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes)...
      node-tuning                                4.13.0-0.nightly-2023-06-06-194351   True        False         False      173m    
      openshift-apiserver                        4.13.0-0.nightly-2023-06-06-194351   True        True          True       107m    APIServerDeploymentDegraded: 1 of 4 requested instances are unavailable for apiserver.openshift-apiserver ()
      openshift-controller-manager               4.13.0-0.nightly-2023-06-06-194351   True        False         False      170m    
      openshift-samples                          4.13.0-0.nightly-2023-06-06-194351   True        False         False      167m    
      operator-lifecycle-manager                 4.13.0-0.nightly-2023-06-06-194351   True        False         False      174m    
      operator-lifecycle-manager-catalog         4.13.0-0.nightly-2023-06-06-194351   True        False         False      174m    
      operator-lifecycle-manager-packageserver   4.13.0-0.nightly-2023-06-06-194351   True        False         False      168m    
      service-ca                                 4.13.0-0.nightly-2023-06-06-194351   True        False         False      175m    
      storage                                    4.13.0-0.nightly-2023-06-06-194351   True        True          False      174m    AzureDiskCSIDriverOperatorCRProgressing: AzureDiskDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods...
      liuhuali@Lius-MacBook-Pro huali-test % 
      
      Case 3.Delete a master node, the old machine stuck in Deleting, no new machine created.
      liuhuali@Lius-MacBook-Pro huali-test % oc delete node huliu-az7a-d2lgf-master-0 
      node "huliu-az7a-d2lgf-master-0" deleted
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine
      NAME                                   PHASE      TYPE              REGION   ZONE   AGE
      huliu-az7a-d2lgf-master-0              Deleting   Standard_D8s_v3   westus          4h58m
      huliu-az7a-d2lgf-master-1              Running    Standard_D8s_v3   westus          4h58m
      huliu-az7a-d2lgf-master-rjxg9-2        Running    Standard_D8s_v3   westus          27m
      huliu-az7a-d2lgf-worker-westus-blbrj   Running    Standard_D4s_v3   westus          4h49m
      huliu-az7a-d2lgf-worker-westus-r7g5l   Running    Standard_D4s_v3   westus          4h15m
      huliu-az7a-d2lgf-worker-westus-rrc97   Running    Standard_D4s_v3   westus          4h49m
      liuhuali@Lius-MacBook-Pro huali-test % oc get node
      NAME                                   STATUS   ROLES                  AGE     VERSION
      huliu-az7a-d2lgf-master-1              Ready    control-plane,master   4h53m   v1.27.2+cc041e8
      huliu-az7a-d2lgf-master-rjxg9-2        Ready    control-plane,master   23m     v1.27.2+cc041e8
      huliu-az7a-d2lgf-worker-westus-blbrj   Ready    worker                 4h38m   v1.27.2+cc041e8
      huliu-az7a-d2lgf-worker-westus-r7g5l   Ready    worker                 4h8m    v1.27.2+cc041e8
      huliu-az7a-d2lgf-worker-westus-rrc97   Ready    worker                 4h38m   v1.27.2+cc041e8
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine
      NAME                                   PHASE      TYPE              REGION   ZONE   AGE
      huliu-az7a-d2lgf-master-0              Deleting   Standard_D8s_v3   westus          8h
      huliu-az7a-d2lgf-master-1              Running    Standard_D8s_v3   westus          8h
      huliu-az7a-d2lgf-master-rjxg9-2        Running    Standard_D8s_v3   westus          4h27m
      huliu-az7a-d2lgf-worker-westus-8wnmq   Running    Standard_D4s_v3   westus          3h20m
      huliu-az7a-d2lgf-worker-westus-df6wj   Running    Standard_D4s_v3   westus          167m
      huliu-az7a-d2lgf-worker-westus-r7g5l   Running    Standard_D4s_v3   westus          8h
      liuhuali@Lius-MacBook-Pro huali-test % oc get node
      NAME                                   STATUS   ROLES                  AGE     VERSION
      huliu-az7a-d2lgf-master-1              Ready    control-plane,master   8h      v1.27.2+cc041e8
      huliu-az7a-d2lgf-master-rjxg9-2        Ready    control-plane,master   4h23m   v1.27.2+cc041e8
      huliu-az7a-d2lgf-worker-westus-8wnmq   Ready    worker                 3h15m   v1.27.2+cc041e8
      huliu-az7a-d2lgf-worker-westus-df6wj   Ready    worker                 158m    v1.27.2+cc041e8
      huliu-az7a-d2lgf-worker-westus-r7g5l   Ready    worker                 8h      v1.27.2+cc041e8
      liuhuali@Lius-MacBook-Pro huali-test % oc get co
      NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.14.0-0.nightly-2023-06-05-112833   True        False         False      45m     
      baremetal                                  4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      cloud-controller-manager                   4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      cloud-credential                           4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      cluster-autoscaler                         4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      config-operator                            4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      console                                    4.14.0-0.nightly-2023-06-05-112833   True        False         False      72m     
      control-plane-machine-set                  4.14.0-0.nightly-2023-06-05-112833   True        False         False      4h21m   
      csi-snapshot-controller                    4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      dns                                        4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      etcd                                       4.14.0-0.nightly-2023-06-05-112833   True        False         True       8h      EtcdEndpointsDegraded: EtcdEndpointsController can't evaluate whether quorum is safe: CheckSafeToScaleCluster 3 nodes are required, but only 2 are available
      image-registry                             4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      ingress                                    4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      insights                                   4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      kube-apiserver                             4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      kube-controller-manager                    4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      kube-scheduler                             4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      kube-storage-version-migrator              4.14.0-0.nightly-2023-06-05-112833   True        False         False      4h30m   
      machine-api                                4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      machine-approver                           4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      machine-config                             4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      marketplace                                4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      monitoring                                 4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      network                                    4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      node-tuning                                4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      openshift-apiserver                        4.14.0-0.nightly-2023-06-05-112833   True        False         False      32m     
      openshift-controller-manager               4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      openshift-samples                          4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      operator-lifecycle-manager                 4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      operator-lifecycle-manager-catalog         4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      operator-lifecycle-manager-packageserver   4.14.0-0.nightly-2023-06-05-112833   False       True          False      166m    ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: ComponentUnhealthy, message: apiServices not installed
      service-ca                                 4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      storage                                    4.14.0-0.nightly-2023-06-05-112833   True        False         False      8h      
      liuhuali@Lius-MacBook-Pro huali-test % 
      liuhuali@Lius-MacBook-Pro huali-test % oc logs machine-api-controllers-5c58f965fd-nwjxh -c machine-healthcheck-controller
      …
      I0607 10:13:57.348074       1 machinehealthcheck_controller.go:162] Reconciling openshift-machine-api/control-plane-health
      I0607 10:13:57.348318       1 machinehealthcheck_controller.go:188] Reconciling openshift-machine-api/control-plane-health: finding targets
      I0607 10:13:57.348601       1 machinehealthcheck_controller.go:465] Reconciling openshift-machine-api/control-plane-health/huliu-az7a-d2lgf-master-0/huliu-az7a-d2lgf-master-0: health checking
      I0607 10:13:57.348615       1 machinehealthcheck_controller.go:465] Reconciling openshift-machine-api/control-plane-health/huliu-az7a-d2lgf-master-1/huliu-az7a-d2lgf-master-1: health checking
      I0607 10:13:57.348624       1 machinehealthcheck_controller.go:465] Reconciling openshift-machine-api/control-plane-health/huliu-az7a-d2lgf-master-rjxg9-2/huliu-az7a-d2lgf-master-rjxg9-2: health checking
      I0607 10:13:57.348636       1 machinehealthcheck_controller.go:251] Remediations are allowed for openshift-machine-api/control-plane-health: total targets: 3,  max unhealthy: 1, unhealthy targets: 1
      I0607 10:13:57.360972       1 machinehealthcheck_controller.go:287] Reconciling openshift-machine-api/control-plane-health/huliu-az7a-d2lgf-master-0/huliu-az7a-d2lgf-master-0: meet unhealthy criteria, triggers remediation
      I0607 10:13:57.361004       1 machinehealthcheck_controller.go:628]  openshift-machine-api/control-plane-health/huliu-az7a-d2lgf-master-0/huliu-az7a-d2lgf-master-0: start remediation logic
      I0607 10:13:57.361060       1 machinehealthcheck_controller.go:279] Reconciling openshift-machine-api/control-plane-health: no more targets meet unhealthy criteria
      
      liuhuali@Lius-MacBook-Pro huali-test % oc logs control-plane-machine-set-operator-7cb46c5fdd-4lgrc
      …
      I0607 10:14:48.262167       1 controller.go:157]  "msg"="Reconciling control plane machine set" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="26c85d60-742b-48e9-b162-a398b170eb4c"
      I0607 10:14:48.332215       1 controller.go:210]  "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="26c85d60-742b-48e9-b162-a398b170eb4c"
      E0607 10:14:48.332285       1 controller.go:329]  "msg"="Reconciler error" "error"="error reconciling control plane machine set: error fetching machine info: could not generate machine info for machine huliu-az7a-d2lgf-master-0: error checking machine readiness: failed to get Node \"huliu-az7a-d2lgf-master-0\": Node \"huliu-az7a-d2lgf-master-0\" not found" "controller"="controlplanemachineset" "reconcileID"="26c85d60-742b-48e9-b162-a398b170eb4c" 

      Actual results:

      MHC for control plane cannot work right for case2 and case3

      Expected results:

      MHC for control plane should work right

      Additional info:

      Must gather for case2: https://drive.google.com/file/d/1eRa4tt7Mr5hMD8PVCOUgCLFHFvmx2J3T/view?usp=sharing
      
      Must gather for case3:
      https://drive.google.com/file/d/1ZdBXhQCDJvc2zcZhBmZaWicykzsbL3Z_/view?usp=sharing
      
      
      

        1. block.txt
          2.60 MB
        2. goroutine_full.txt
          604 kB
        3. goroutine.txt
          49 kB

            joelspeed Joel Speed
            huliu@redhat.com Huali Liu
            Huali Liu Huali Liu
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: