Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-61188

Nodes fail to join HCP on Kubevirt for OCP 4.16.43, 4.16.45 due to pending CSRs and missing machine-approver pod

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      When Deploying HCP Kubevirt cluster versions 4.16.43 and 4.16.45 on an OCP cluster with version 4.16.36, the worker nodes fail to join the HCP cluster.
      
      The nodes' CSRs remain in a Pending state indefinitely, preventing the cluster from becoming fully operational.
      Manually approving the CSRs works as a temporary fix, allowing the nodes to join. This issue is not observed in HCP Kubevirt with version 4.16.36 (matching the OCP version), where nodes join automatically as expected.
      
      
      Further investigation reveals that the machine-approver pod is missing from the hosted control plane namespace in the affected versions (4.16.43 and 4.16.45). This is the likely root cause of the CSRs not being automatically approved.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          100%

      Steps to Reproduce:

      1. Provision a management OCP cluster with version 4.16.36.
      
      2. Install the OpenShift Virtualization Operator and MCE Operator (v2.7.2).
      
      3. Deploy an HCP cluster on the Kubevirt platform, specifying OCP version 
      
      4.16.43 or 4.16.45 for the hosted cluster.
      
      5. Observe the status of the worker nodes (Kubevirt VMs) as they are created.
      
      6. Check the status of Certificate Signing Requests by running oc get csr.

      Actual results:

      The worker nodes do not join the cluster. The CSRs associated with the nodes remain in a Pending state. The machine-approver pod is not found in the hosted control plane's namespace. The cluster only becomes functional after an administrator manually approves the pending CSRs.

      Expected results:

              The worker nodes' CSRs should be automatically approved, and the nodes should seamlessly join the hosted cluster without manual intervention. The machine-approver pod should be present and running in the hosted control plane's namespace.

      Additional info:

          The MCE must-gather report, nodepool list and yaml, hostedcluster YAML and other collected information are shared in the below gdrive link.
      
      https://drive.google.com/drive/folders/1qjbCmCh4xOYFJ1Kbr-DTahkWLIT-ZiD7?usp=sharing
      
      

              ocohen@redhat.com Oren Cohen
              rhn-support-dpateriy Divyam Pateriya
              None
              None
              Yu Li Yu Li
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: