Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-4862

Deletion of BYOH Windows node hangs in Ready,SchedulingDisabled


    • Moderate
    • Rejected
    • False
    • Hide



      Description of problem:

      When deleting a BYOH node in Platform:none, as well as in an Azure IPI cluster the node gets reconciled correctly, however when added back to the cluster it stays in Ready,SchedulingDisabled. When checking the WMCO logs, we can observe the following log:
      {"level":"error","ts":"2022-12-14T16:14:31Z","msg":"Reconciler error","controller":"configmap","controllerGroup":"","controllerKind":"ConfigMap","configMap":{"name":"windows-instances","namespace":"openshift-windows-machine-config-operator"},"namespace":"openshift-windows-machine-config-operator","name":"windows-instances","reconcileID":"d66a3142-d52c-43f5-8a42-214ce9c88417","error":"error configuring host with address configuring node network failed: error waiting for k8s.ovn.org/hybrid-overlay-node-subnet node annotation for byoh-2019: timeout waiting for k8s.ovn.org/hybrid-overlay-node-subnet node annotation: timed out waiting for the condition"
      And when checking the node's annotation, it is indeed missing:
      $ oc get nodes byoh-2019 -o=jsonpath="{.metadata.annotations}"
      Tested in Azure IPI and Platform:None, in both cases the issue got reproduced.

      Version-Release number of selected component (if applicable):

      $ oc get cm -n openshift-windows-machine-config-operator 
      NAME                                   DATA   AGE
      kube-root-ca.crt                       1      10h
      openshift-service-ca.crt               1      10h
      windows-instances                      2      9h
      windows-machine-config-operator-lock   0      6h24m
      windows-services-7.0.0-16f486a         2      6h23m
      $ oc get clusterversion
      version   4.12.0-rc.4   True        False         6h48m   Cluster version is 4.12.0-rc.4

      How reproducible:


      Steps to Reproduce:

      1. Deploy a OCP 4.11 cluster with WMCO 6.0.0
      2. Add one or two byoh nodes to the cluster
      3. Upgrade the cluster to OCP 4.12, and later WMCO to 7.0.0
      4. Remove one of the byoh nodes using: oc delete node <byoh-node-id>
      5. Wait for reconciliation to bring the node back

      Actual results:

      The deleted node gets re-added but stays in Ready,SchedulingDisabled and the workloads left in Pending state.

      Expected results:

      The node gets properly added to the cluster and stays in Ready.

      Additional info:


        1. containerd.log
          8.35 MB
          Aharon Rasouli
        2. ip-10-0-151-124.us-east-2.compute.internal.containerd.log
          51 kB
          Aharon Rasouli
        3. ip-10-0-151-124.us-east-2.compute.internal.log
          289 kB
          Aharon Rasouli
        4. must-gather-byoh-node-delete.tar.gz
          22.62 MB
          Jose Luis Franco Arza
        5. wmco-byoh-node-delete.log
          72 kB
          Jose Luis Franco Arza

            jtanenba@redhat.com Jacob Tanenbaum
            rhn-engineering-jfrancoa Jose Luis Franco Arza (Inactive)
            Aharon Rasouli Aharon Rasouli
            0 Vote for this issue
            11 Start watching this issue