Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-52266

OpenShift Virtualization upgrade is stuck on hosted control planes on bare metal

XMLWordPrintable

    • CNV I/U Operators Sprint 264, CNV Virt-Node Sprint 263
    • Critical
    • None

      Description of problem:

      OpenShift Virtualization is installed on one of the hosted cluster in baremetal hosted control plane. During the upgrade, "virt-operator-strategy-dumper" job is created. The job have the following affinity:

      # oc get job kubevirt-038e69c566dacce9f32740e77415213008080ae5-jobtnj4x -o yaml |yq '.spec.template.spec.affinity'
      nodeAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          - preference:
              matchExpressions:
                - key: node-role.kubernetes.io/worker
                  operator: DoesNotExist
            weight: 100
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: node-role.kubernetes.io/control-plane
                  operator: Exists
            - matchExpressions:
                - key: node-role.kubernetes.io/master
                  operator: Exists

      There is no master node in the hosted cluster since control plane runs on management/hub cluster.

      So pod fails to schedule:

      # oc get events|grep kubevirt-038e69c566dacce9f32740e77415213008080ae5-jobtnj4xsh5xd
      Unknown    Warning  FailedScheduling      Pod/kubevirt-038e69c566dacce9f32740e77415213008080ae5-jobtnj4xsh5xd  0/2 nodes are available: 2 node(s) didn't match Pod's node affinity/selector. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.

      And the upgrade is stuck with following error in virt-operator:

      2024-12-01T20:47:45.383727766Z {"component":"virt-operator","level":"info","msg":"Install strategy config map not loaded. reason: no install strategy configmap found for version sha256:99880052bdb7d896c4fa39a7d2956ce4be576a59ac4c398456cffccf04e17e57 with registry ","pos":"kubevirt.go:866","timestamp":"2024-12-01T20:47:45.383707Z"}
      
      2024-12-01T20:47:45.383749320Z {"component":"virt-operator","kind":"","level":"error","msg":"Waiting on install strategy to be posted from job kubevirt-038e69c566dacce9f32740e77415213008080ae5-jobtnj4x","name":"kubevirt-038e69c566dacce9f32740e77415213008080ae5-jobtnj4x","namespace":"openshift-cnv","pos":"kubevirt.go:918","timestamp":"2024-12-01T20:47:45.383726Z","uid":"603b91ee-52a6-4f39-b97e-6dc8b7959943"}

      Version-Release number of selected component (if applicable):

      OpenShift Virtualization 4.16 to 4.17 upgrade on a HCP baremetal

      How reproducible:

      Observed in customer's environment

      Steps to Reproduce:

      1. Install OpenShift Virtualization on hosted control planes on bare metal cluster.
      2. Upgrade the hosted cluster. OpenShift Virtualization upgrade gets stuck with above mentioned errors.
      

      Actual results:

      OpenShift Virtualization upgrade is stuck on hosted control planes on bare metal

      Expected results:

      Upgrade should work.

      Additional info:

      Impacts ROSA-HCP customers on 4.17, attempting to run OpenShift Virtualization, trying to upgrade.

              ocohen@redhat.com Oren Cohen
              rhn-support-nashok Nijin Ashok
              Satheesaran Sundaramoorthi Satheesaran Sundaramoorthi
              Votes:
              3 Vote for this issue
              Watchers:
              16 Start watching this issue

                Created:
                Updated: