Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-41081

[4.16] CNV with LVMS on multi-node: cross node cloning fails

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • CNV v4.16.1
    • CNV v4.16.0
    • CNV Storage
    • None
    • 1
    • False
    • Hide

      None

      Show
      None
    • False
    • CNV v4.17.0.rhel9-60, CNV v4.16.1.rhel9-9
    • Release Notes
    • Hide
      For the LVM storage on a multiple-node cluster, it is recommended to use the 'copy' (host-assisted) cloneStrategy to make sure the cross-node clone will succeed. Please apply this patch to the lvms storageProfile:

      oc patch storageprofile lvms-vg1 --type='merge' --patch='{"spec": {"cloneStrategy": "copy"}}'

      For the LVM storage on a single-node cluster (SNO), it is recommended to use the 'csi-clone' cloneStrategy to benefit from the LVMS' snapshot capability and increase the clone efficiency. Please apply this patch to the lvms storageProfile:

      oc patch storageprofile lvms-vg1 --type='merge' --patch='{"spec": {"cloneStrategy": "csi-clone"}}' 
      Show
      For the LVM storage on a multiple-node cluster, it is recommended to use the 'copy' (host-assisted) cloneStrategy to make sure the cross-node clone will succeed. Please apply this patch to the lvms storageProfile: oc patch storageprofile lvms-vg1 --type='merge' --patch='{"spec": {"cloneStrategy": "copy"}}' For the LVM storage on a single-node cluster (SNO), it is recommended to use the 'csi-clone' cloneStrategy to benefit from the LVMS' snapshot capability and increase the clone efficiency. Please apply this patch to the lvms storageProfile: oc patch storageprofile lvms-vg1 --type='merge' --patch='{"spec": {"cloneStrategy": "csi-clone"}}' 
    • Known Issue
    • Proposed
    • ---
    • ---
    • Yes
    • Storage Core Sprint 257
    • No

      Description of problem:

      Can't provision a VM from a boot source image on a different node 

      Version-Release number of selected component (if applicable):

      4.16, 4.15 (LVMS on multi-node)

      How reproducible:

      Always

      Steps to Reproduce:

      1. See on which node the boot spurce image is:
      $ oc get pv | grep fedora
      pvc-e604b04f-83dc-46c7-bb63-1633deb7ff59   30Gi       RWO            Delete           Bound    openshift-virtualization-os-images/fedora-722ac1d6b4f1           lvms-vg1       <unset>                          112m
      
      
      $ oc get pv pvc-e604b04f-83dc-46c7-bb63-1633deb7ff59 -ojson | jq .spec.nodeAffinity.required.nodeSelectorTerms[0].matchExpressions[0]
      {
        "key": "topology.topolvm.io/node",
        "operator": "In",
        "values": [
          "c01-jp416-lvms-la2-7x598-worker-0-9z56j"
        ]
      }
      2. Create a VM on a different node:
      $ cat vm-clone-lvms-wrong-node.yaml
      apiVersion: kubevirt.io/v1
      kind: VirtualMachine
      metadata:
        name: vm-fix-node
        namespace: openshift-virtualization-os-images
      spec:
        dataVolumeTemplates:
        - metadata:
            name: dv-fix-node
            namespace: openshift-virtualization-os-images
          spec:
            storage:
              resources:
                requests:
                  storage: 30Gi
              storageClassName: lvms-vg1
            source:
              pvc:
                namespace: openshift-virtualization-os-images
                name: fedora-722ac1d6b4f1
        running: true
        template:
          spec:
            nodeSelector:
              "kubernetes.io/hostname": "c01-jp416-lvms-la2-7x598-worker-0-78pt5"
            domain:
              devices:
                disks:
                - disk:
                    bus: virtio
                  name: datavolume
              machine:
                type: ""
              resources:
                requests:
                  memory: 1Gi
            terminationGracePeriodSeconds: 0
            volumes:
            - dataVolume:
                name: dv-fix-node
              name: datavolume
      

      Actual results:

      VolumeSnapshot is created, VM is in Provisioning state, VMI is Scheduling, and DV can't be cloned. 

      $ oc get vm -A
      NAMESPACE                            NAME          AGE   STATUS         READY
      openshift-virtualization-os-images   vm-fix-node   10m   Provisioning   False
      
      
      $ oc describe vm -n openshift-virtualization-os-images vm-fix-node
      ...
          Message:               Not all of the VMI's DVs are ready
          Reason:                NotAllDVsReady
          Status:                False
          Type:                  DataVolumesReady
          Last Probe Time:       <nil>
          Last Transition Time:  2024-04-21T14:27:08Z
          Message:               running PreBind plugin "VolumeBinding": binding volumes: context deadline exceeded
          Reason:                SchedulerError
          Status:                False
          Type:                  PodScheduled
      $ oc describe dv -n openshift-virtualization-os-images dv-fix-node
      Name:         dv-fix-node
      Namespace:    openshift-virtualization-os-images
      Labels:       instancetype.kubevirt.io/default-instancetype=u1.medium
                    instancetype.kubevirt.io/default-preference=fedora
                    kubevirt.io/created-by=f7f46074-d683-4cae-8466-25139e94aba7
      Annotations:  cdi.kubevirt.io/allowClaimAdoption: true
                    cdi.kubevirt.io/cloneType: snapshot
                    cdi.kubevirt.io/storage.clone.token:
                      eyJhbGciOiJQUzI1NiJ9.eyJleHAiOjE3MTM3MDkzMjgsImlhdCI6MTcxMzcwOTAyOCwiaXNzIjoiY2RpLWFwaXNlcnZlciIsIm5hbWUiOiJmZWRvcmEtNzIyYWMxZDZiNGYxIiwib...
                    cdi.kubevirt.io/storage.usePopulator: true
      API Version:  cdi.kubevirt.io/v1beta1
      Kind:         DataVolume
      Metadata:
        Creation Timestamp:  2024-04-21T14:17:08Z
        Generation:          1
        Owner References:
          API Version:           kubevirt.io/v1
          Block Owner Deletion:  true
          Controller:            true
          Kind:                  VirtualMachine
          Name:                  vm-fix-node
          UID:                   f7f46074-d683-4cae-8466-25139e94aba7
        Resource Version:        172803
        UID:                     e50b5c09-0c18-4570-9f59-369b99d0dec5
      Spec:
        Source:
          Pvc:
            Name:       fedora-722ac1d6b4f1
            Namespace:  openshift-virtualization-os-images
        Storage:
          Resources:
            Requests:
              Storage:         30Gi
          Storage Class Name:  lvms-vg1
      Status:
        Claim Name:  dv-fix-node
        Conditions:
          Last Heartbeat Time:   2024-04-21T14:17:08Z
          Last Transition Time:  2024-04-21T14:17:08Z
          Message:               PVC dv-fix-node Pending
          Reason:                Pending
          Status:                False
          Type:                  Bound
          Last Heartbeat Time:   2024-04-21T14:17:09Z
          Last Transition Time:  2024-04-21T14:17:08Z
          Reason:                TransferRunning
          Status:                False
          Type:                  Ready
          Last Heartbeat Time:   2024-04-21T14:17:08Z
          Last Transition Time:  2024-04-21T14:17:08Z
          Reason:                Populator is running
          Status:                True
          Type:                  Running
        Phase:                   PrepClaimInProgress
        Progress:                N/A
      Events:
        Type    Reason                           Age   From                             Message
        ----    ------                           ----  ----                             -------
        Normal  Pending                          10m   datavolume-pvc-clone-controller  PVC dv-fix-node Pending
        Normal  CloneScheduled                   10m   datavolume-pvc-clone-controller  Cloning from openshift-virtualization-os-images/fedora-722ac1d6b4f1 into openshift-virtualization-os-images/dv-fix-node scheduled
        Normal  SnapshotForSmartCloneInProgress  10m   datavolume-pvc-clone-controller  Creating snapshot for smart-clone is in progress (for pvc openshift-virtualization-os-images/fedora-722ac1d6b4f1)
        Normal  PrepClaimInProgress              10m   datavolume-pvc-clone-controller  Prepping PersistentVolumeClaim for DataVolume openshift-virtualization-os-images/dv-fix-node
      $ oc get pvc -n openshift-virtualization-os-images tmp-pvc-910e0ef5-61c2-4cf1-9b95-35d257b8f37d -oyaml
      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        annotations:
          cdi.kubevirt.io/allowClaimAdoption: "true"
          cdi.kubevirt.io/clonePhase: Snapshot
          cdi.kubevirt.io/cloneType: snapshot
          cdi.kubevirt.io/createdForDataVolume: e50b5c09-0c18-4570-9f59-369b99d0dec5
          cdi.kubevirt.io/storage.condition.running: "true"
          cdi.kubevirt.io/storage.condition.running.message: ""
          cdi.kubevirt.io/storage.condition.running.reason: Populator is running
          cdi.kubevirt.io/storage.contentType: kubevirt
          cdi.kubevirt.io/storage.pod.restarts: "0"
          cdi.kubevirt.io/storage.populator.kind: VolumeCloneSource
          cdi.kubevirt.io/storage.preallocation.requested: "false"
          cdi.kubevirt.io/storage.usePopulator: "true"
          pv.kubernetes.io/bind-completed: "yes"
          pv.kubernetes.io/bound-by-controller: "yes"
          volume.beta.kubernetes.io/storage-provisioner: topolvm.io
          volume.kubernetes.io/selected-node: c01-jp416-lvms-la2-7x598-worker-0-78pt5
          volume.kubernetes.io/storage-provisioner: topolvm.io
        creationTimestamp: "2024-04-21T14:17:09Z"
        finalizers:
        - kubernetes.io/pvc-protection
        labels:
          app: containerized-data-importer
          app.kubernetes.io/component: storage
          app.kubernetes.io/managed-by: cdi-controller
          app.kubernetes.io/part-of: hyperconverged-cluster
          app.kubernetes.io/version: 4.16.0
          cdi.kubevirt.io/OwnedByUID: 910e0ef5-61c2-4cf1-9b95-35d257b8f37d
          instancetype.kubevirt.io/default-instancetype: u1.medium
          instancetype.kubevirt.io/default-preference: fedora
          kubevirt.io/created-by: f7f46074-d683-4cae-8466-25139e94aba7
        name: tmp-pvc-910e0ef5-61c2-4cf1-9b95-35d257b8f37d
        namespace: openshift-virtualization-os-images
        resourceVersion: "172835"
        uid: 6b4e4245-a6fe-4fdb-8ef6-95d0af0a5294
      spec:
        accessModes:
        - ReadWriteOnce
        dataSource:
          apiGroup: snapshot.storage.k8s.io
          kind: VolumeSnapshot
          name: tmp-snapshot-910e0ef5-61c2-4cf1-9b95-35d257b8f37d
        dataSourceRef:
          apiGroup: snapshot.storage.k8s.io
          kind: VolumeSnapshot
          name: tmp-snapshot-910e0ef5-61c2-4cf1-9b95-35d257b8f37d
        resources:
          requests:
            storage: 30Gi
        storageClassName: lvms-vg1
        volumeMode: Block
        volumeName: pvc-6b4e4245-a6fe-4fdb-8ef6-95d0af0a5294
      status:
        accessModes:
        - ReadWriteOnce
        capacity:
          storage: 30Gi
        phase: Bound
       $ oc get pvc -n openshift-virtualization-os-images dv-fix-node -oyaml
      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        annotations:
          cdi.kubevirt.io/allowClaimAdoption: "true"
          cdi.kubevirt.io/clonePhase: PrepClaim
          cdi.kubevirt.io/cloneType: snapshot
          cdi.kubevirt.io/createdForDataVolume: e50b5c09-0c18-4570-9f59-369b99d0dec5
          cdi.kubevirt.io/storage.condition.running: "true"
          cdi.kubevirt.io/storage.condition.running.message: ""
          cdi.kubevirt.io/storage.condition.running.reason: Populator is running
          cdi.kubevirt.io/storage.contentType: kubevirt
          cdi.kubevirt.io/storage.pod.restarts: "0"
          cdi.kubevirt.io/storage.preallocation.requested: "false"
          cdi.kubevirt.io/storage.usePopulator: "true"
          volume.beta.kubernetes.io/storage-provisioner: topolvm.io
          volume.kubernetes.io/selected-node: c01-jp416-lvms-la2-7x598-worker-0-78pt5
          volume.kubernetes.io/storage-provisioner: topolvm.io
        creationTimestamp: "2024-04-21T14:17:08Z"
        finalizers:
        - kubernetes.io/pvc-protection
        - cdi.kubevirt.io/clonePopulator
        labels:
          app: containerized-data-importer
          app.kubernetes.io/component: storage
          app.kubernetes.io/managed-by: cdi-controller
          app.kubernetes.io/part-of: hyperconverged-cluster
          app.kubernetes.io/version: 4.16.0
          instancetype.kubevirt.io/default-instancetype: u1.medium
          instancetype.kubevirt.io/default-preference: fedora
          kubevirt.io/created-by: f7f46074-d683-4cae-8466-25139e94aba7
        name: dv-fix-node
        namespace: openshift-virtualization-os-images
        ownerReferences:
        - apiVersion: cdi.kubevirt.io/v1beta1
          blockOwnerDeletion: true
          controller: true
          kind: DataVolume
          name: dv-fix-node
          uid: e50b5c09-0c18-4570-9f59-369b99d0dec5
        resourceVersion: "172802"
        uid: 910e0ef5-61c2-4cf1-9b95-35d257b8f37d
      spec:
        accessModes:
        - ReadWriteOnce
        dataSource:
          apiGroup: cdi.kubevirt.io
          kind: VolumeCloneSource
          name: volume-clone-source-e50b5c09-0c18-4570-9f59-369b99d0dec5
        dataSourceRef:
          apiGroup: cdi.kubevirt.io
          kind: VolumeCloneSource
          name: volume-clone-source-e50b5c09-0c18-4570-9f59-369b99d0dec5
        resources:
          requests:
            storage: "32212254720"
        storageClassName: lvms-vg1
        volumeMode: Block
      status:
        phase: Pending  
      $ oc get VolumeSnapshot -A
      NAMESPACE                            NAME                                                READYTOUSE   SOURCEPVC             SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS   SNAPSHOTCONTENT                                    CREATIONTIME   AGE
      openshift-virtualization-os-images   tmp-snapshot-910e0ef5-61c2-4cf1-9b95-35d257b8f37d   true         fedora-722ac1d6b4f1                           30Gi          lvms-vg1        snapcontent-b0c600b7-6bcd-43c2-910f-049f96fb698d   17m            17m
      $ oc describe pods -n openshift-virtualization-os-images prep-910e0ef5-61c2-4cf1-9b95-35d257b8f37d | grep Events -A 10
      Events:
        Type     Reason       Age                          From     Message
        ----     ------       ----                         ----     -------
        Warning  FailedMount  19m                          kubelet  Unable to attach or mount volumes: unmounted volumes=[cdi-data-vol], unattached volumes=[], failed to process volumes=[cdi-data-vol]: error processing PVC openshift-virtualization-os-images/tmp-pvc-910e0ef5-61c2-4cf1-9b95-35d257b8f37d: PVC is not bound
        Warning  FailedMount  <invalid> (x11833 over 19m)  kubelet  MapVolume.NodeAffinity check failed for volume "pvc-6b4e4245-a6fe-4fdb-8ef6-95d0af0a5294" : no matching NodeSelectorTerms  

      Expected results:

      Fall back to the host-assisted clone

      Additional info:

       

            rhn-support-awels Alexander Wels
            jpeimer@redhat.com Jenia Peimer
            Jenia Peimer Jenia Peimer
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: