Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-26300

[2174226] CephFS-based VM status changes to "paused" after migration - Release Note

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Normal Normal
    • CNV v4.13.0
    • None
    • CNV Documentation
    • None
    • Medium

      +++ This bug was initially created as a clone of Bug #2092271 +++

      Description of problem:
      Failed to migrate VM with ocs-storagecluster-cephfs and the VM's status is changed to Paused after migration

      Version-Release number of selected component (if applicable):
      CNV4.10

      How reproducible:
      100%

      Steps to Reproduce:
      1. Start a VM form DV with storage class: ocs-storagecluster-cephfs

      1. oc create -f asb-vm-dv-ocs-cephfs.yaml
        2. Login to VM, touch file: migration
        3. Try to migrate the VM in web console by clicking "Migrate Node to Node",
        the VM is not migrated and the status is changed to Paused.
      1. oc get pod -o wide|grep virt-launcher| grep cephfs
        virt-launcher-asb-vm-dv-ocs-cephfs-dg747 1/1 Running 0 46s 10.129.1.92 dell-per730-64.lab.eng.pek2.redhat.com <none> 0/1
        virt-launcher-asb-vm-dv-ocs-cephfs-wfft5 1/1 Running 0 2m1s 10.128.1.61 dell-per730-63.lab.eng.pek2.redhat.com <none>
      1. oc get pod -o wide|grep virt-launcher| grep cephfs
        virt-launcher-asb-vm-dv-ocs-cephfs-dg747 0/1 Completed 0 71s 10.129.1.92 dell-per730-64.lab.eng.pek2.redhat.com <none> 0/1
        virt-launcher-asb-vm-dv-ocs-cephfs-wfft5 1/1 Running 0 2m26s 10.128.1.61 dell-per730-63.lab.eng.pek2.redhat.com <none>
      1. oc rsh virt-launcher-asb-vm-dv-ocs-cephfs-wfft5
        sh-4.4# virsh list --all
        Id Name State
        ---------------------------------------------------
        1 openshift-cnv_asb-vm-dv-ocs-cephfs paused
      1. mount|grep cephfs
        172.30.225.152:6789,172.30.162.143:6789,172.30.149.241:6789:/volumes/csi/csi-vol-a604ad71-e17e-11ec-93c3-0a580a82017d/484a041a-0d62-43da-bcfe-218c2985be1f on /run/kubevirt-private/vmi-disks/rootdisk type ceph (rw,relatime,seclabel,name=csi-cephfs-node,secret=<hidden>,acl,mds_namespace=ocs-storagecluster-cephfilesystem)

      4. Get the error messages:
      "server error. command Migrate failed: "migration job 60df6743-158c-4afd-b07f-01e1f7c6b33d already executed, finished at 2022-06-01 07:46:51.413073411 +0000 UTC, completed: true, failed: true, abortStatus: "

      Actual results:
      In step3: Failed to migrate the VM and the VM status is changed to paused

      Expected results:
      In step3: Migrate VM successfully, or forbid this operation if it's not supported

      Additional info:

      • asb-vm-dv-ocs-cephfs.yaml
      • /var/log/libvirt/qemu/openshift-cnv_asb-vm-dv-ocs-cephfs.log

      — Additional comment from on 2022-06-01 08:29:42 UTC —

      — Additional comment from on 2022-06-01 08:30:31 UTC —

      — Additional comment from on 2022-06-01 12:23:29 UTC —

      Chenli, would you be able to re-test this scenario while using the RBD storage class? It might be that the issue here could be IO related.

      It would be helpful if you were able to capture the related virt-launcher and virt-handler logs. Would you also be able to post the Pod and VMI manifests?

      — Additional comment from on 2022-06-07 09:33:38 UTC —

      (In reply to sgott from comment #3)
      > Chenli, would you be able to re-test this scenario while using the RBD
      > storage class? It might be that the issue here could be IO related.
      >
      > It would be helpful if you were able to capture the related virt-launcher
      > and virt-handler logs. Would you also be able to post the Pod and VMI
      > manifests?

      Stu, I re-test this scenario with ceph rbd storage class, migrate VM from node to another node successfully.
      The issue only happened on VM with cephfs storage class.

      Please see the attached file: asb-vm-dv-ocs-cephfs.yaml for the VMI manifests,
      and the described pod information in files: virt-launcher-*-tjs6l-source/target

      • Create VM
      1. oc create -f asb-vm-dv-ocs-cephfs.yaml
      2. oc get pod -o wide|grep virt-launcher
        virt-launcher-asb-vm-dv-ocs-cephfs-tjs6l 1/1 Running 0 2m33s 10.128.1.72 dell-per730-63.lab.eng.pek2.redhat.com <none> 1/1
      • Migrate VM in web console
      1. oc get pod -o wide|grep virt-launcher
        virt-launcher-asb-vm-dv-ocs-cephfs-pjlsx 0/1 ContainerCreating 0 4s <none> dell-per730-64.lab.eng.pek2.redhat.com <none> 0/1
        virt-launcher-asb-vm-dv-ocs-cephfs-tjs6l 1/1 Running 0 2m49s 10.128.1.72 dell-per730-63.lab.eng.pek2.redhat.com <none> 1/1
      1. oc get pod -o wide|grep virt-launcher
        virt-launcher-asb-vm-dv-ocs-cephfs-pjlsx 1/1 Running 0 9s 10.129.0.65 dell-per730-64.lab.eng.pek2.redhat.com <none> 0/1
        virt-launcher-asb-vm-dv-ocs-cephfs-tjs6l 1/1 Running 0 2m54s 10.128.1.72 dell-per730-63.lab.eng.pek2.redhat.com <none> 1/1
      • Describe the pod information
      1. oc describe pod virt-launcher-asb-vm-dv-ocs-cephfs-tjs6l > virt-launcher-asb-vm-dv-ocs-cephfs-tjs6l-source
      2. oc describe pod virt-launcher-asb-vm-dv-ocs-cephfs-pjlsx > virt-launcher-asb-vm-dv-ocs-cephfs-pjlsx-target
      1. oc get pod -o wide|grep virt-launcher
        virt-launcher-asb-vm-dv-ocs-cephfs-pjlsx 0/1 Completed 0 4m9s 10.129.0.65 dell-per730-64.lab.eng.pek2.redhat.com <none> 0/1
        virt-launcher-asb-vm-dv-ocs-cephfs-tjs6l 1/1 Running 0 6m54s 10.128.1.72 dell-per730-63.lab.eng.pek2.redhat.com <none> 0/1
      1. oc rsh virt-launcher-asb-vm-dv-ocs-cephfs-tjs6l
        sh-4.4# virsh list --all
        Id Name State
        ---------------------------------------------------
        1 openshift-cnv_asb-vm-dv-ocs-cephfs paused
      1. tail -f /var/log/libvirt/qemu/openshift-cnv_asb-vm-dv-ocs-cephfs.log
        -device VGA,id=video0,vgamem_mb=16,bus=pcie.0,addr=0x1 \
        -device virtio-balloon-pci-non-transitional,id=balloon0,bus=pci.5,addr=0x0 \
        -object rng-random,id=objrng0,filename=/dev/urandom \
        -device virtio-rng-pci-non-transitional,rng=objrng0,id=rng0,bus=pci.6,addr=0x0 \
        -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
        -msg timestamp=on
        2022-06-07 09:08:10.706+0000: Domain id=1 is tainted: custom-ga-command
        2022-06-07 09:10:37.059+0000: initiating migration
        2022-06-07T09:10:42.355174Z qemu-kvm: warning: Failed to unlock byte 201
        2022-06-07T09:10:42.355267Z qemu-kvm: warning: Failed to unlock byte 201

      — Additional comment from on 2022-06-07 09:34:34 UTC —

      — Additional comment from on 2022-06-07 09:35:12 UTC —

      — Additional comment from on 2022-06-08 12:16:42 UTC —

      Cephfs does not support read-write-many as a valid mode, so it's not surprising that this sequence caused an IOError.

      However, this conflicting invalid/configuration should likely have been caught during provisioning. With that in mind, changing the component to Storage for further evaluation. Please feel free to change the component if this appears to be in error.

      — Additional comment from Alex Kalenyuk on 2022-06-15 12:36:28 UTC —

      Cephfs actually does ReadWriteMany:
      https://github.com/ceph/ceph-csi/blob/c85d03c79edcd46c0399dbd0fedd6a8be7703a58/examples/cephfs/pvc.yaml#L8
      Actually, we even tried to bring it to our upstream CI at one point:
      https://github.com/kubevirt/kubevirtci/pull/768

      So it should be eligible for migration AFAIK,
      could I also join Stu's request for manifests and ask for the PVC & DataVolume?

      @chhu@redhat.com

      — Additional comment from Yan Du on 2022-06-29 12:22:23 UTC —

      Stu, I believe we provided all the information from storage side, it looks like a migration issue, can we move to Virt?

      — Additional comment from on 2022-06-29 14:04:51 UTC —

      Thanks, Yan!

      — Additional comment from on 2022-07-05 06:25:23 UTC —

      — Additional comment from on 2022-07-05 06:25:58 UTC —

      — Additional comment from on 2022-07-05 06:26:30 UTC —

      — Additional comment from on 2022-07-05 06:27:56 UTC —

      Hi, Stu, Alex

      Please see the dv, pvc, pv information in attached files: dv.yaml, pvc.yaml, pv.yaml, thank you!

      1. oc get dv
        NAME PHASE PROGRESS RESTARTS AGE
        asb-dv-ocs-cephfs Succeeded 100.0% 146m
      1. oc get dv asb-dv-ocs-cephfs -o yaml >dv.yaml
      1. oc get pvc|grep asb-dv-ocs-cephfs
        asb-dv-ocs-cephfs Bound pvc-212aae52-7459-4d6b-bf6e-b9018bc56866 12Gi RWX ocs-storagecluster-cephfs 149m
      1. oc get pvc asb-dv-ocs-cephfs -o yaml >pvc.yaml
      1. oc get pv|grep asb-dv-ocs-cephfs
        pvc-212aae52-7459-4d6b-bf6e-b9018bc56866 12Gi RWX Delete Bound openshift-cnv/asb-dv-ocs-cephfs ocs-storagecluster-cephfs 150m
      1. oc get pv pvc-212aae52-7459-4d6b-bf6e-b9018bc56866 -o yaml >pv.yaml

      — Additional comment from Igor Bezukh on 2022-11-02 14:02:41 UTC —

      Hi,

      I will add CephFS support in KubevirtCI upstream, will try to reproduce it there.

      — Additional comment from Kedar Bidarkar on 2022-11-09 13:10:02 UTC —

      Deferring this to next release based on current progress related to this bug.

      — Additional comment from Igor Bezukh on 2022-12-14 10:02:13 UTC —

      Hi,

      I managed to reproduce the issue, but what fixed it is a configuration of CephFS CRD

      can you please provide us with the CephFileSystem CRD? I think it may be misconfiguration of CephFS.

      The number of data and metadata replicas should be equal to the number of OSDs that are running on the cluster.

      TIA
      Igor

      — Additional comment from Igor Bezukh on 2022-12-14 12:35:11 UTC —

      Also we suspect this issue as the root cause:

      https://github.com/ceph/ceph-csi/issues/3562

      — Additional comment from Red Hat Bugzilla on 2022-12-15 08:29:04 UTC —

      Account disabled by LDAP Audit for extended failure

      — Additional comment from on 2023-01-04 03:29:21 UTC —

      (In reply to Igor Bezukh from comment #17)
      > Hi,
      >
      > I managed to reproduce the issue, but what fixed it is a configuration of
      > CephFS CRD
      >
      > can you please provide us with the CephFileSystem CRD? I think it may be
      > misconfiguration of CephFS.
      >
      > The number of data and metadata replicas should be equal to the number of
      > OSDs that are running on the cluster.
      >
      > TIA
      > Igor

      Hi Igor

      I'll setup the env and provide the CephFileSystem CRD later, thank you!

      — Additional comment from on 2023-01-16 06:54:49 UTC —

      (In reply to chhu from comment #20)
      > (In reply to Igor Bezukh from comment #17)
      > > Hi,
      > >
      > > I managed to reproduce the issue, but what fixed it is a configuration of
      > > CephFS CRD
      > >
      > > can you please provide us with the CephFileSystem CRD? I think it may be
      > > misconfiguration of CephFS.
      > >
      > > The number of data and metadata replicas should be equal to the number of
      > > OSDs that are running on the cluster.
      > >
      > > TIA
      > > Igor
      >
      > Hi Igor
      >
      > I'll setup the env and provide the CephFileSystem CRD later, thank you!

      Hi Igor

      I reproduced it on my environment with the steps in "Description" part.
      For the environment setup, I just installed the ODF,
      and I haven't do any configuration for the CephFileSystem CRD.
      will you please help to have a check on my env ?
      I sent the env information to you by gchat, thank you!

      — Additional comment from Igor Bezukh on 2023-01-16 07:58:41 UTC —

      The issue that we see with live migration is a side effect of the original issue with CephFS RWX, as described here: https://github.com/ceph/ceph-csi/issues/3562

      I will move this bug to CNV Storage team for further observation.

      — Additional comment from Jan Safranek on 2023-01-23 11:26:34 UTC —

      OCP storage team here: if it's really https://github.com/ceph/ceph-csi/issues/3562, i.e. two Pods with different SELinux contexts are trying to use the same ReadWriteMany volume at the same time, then it's not a bug, but a feature of Kubernetes / OpenShift - it protects data "leaked" from a Pod to a different Pod that uses a different SELinux context. Please get yaml of both Pods an check their pod.spec.securityContext.seLinuxOptions and/or "crictl inspect <container>" if it's really the case.

      If two (or more) Pods want to share data on a volume, they must run with the same SELinux context (pod.spec.securityContext.seLinuxOptions or spec.containers[*].securityContext.seLinuxOptions of all Pod's containers that have the volume mounted). If the fields are missing or empty, the container runtime will assign a random one for each Pod!

      In OpenShift, if the Pods are in the same namespace and their SCC has "SELinuxContext: type: MustRunAs" (e.g. "restricted" SCC), OCP will assign SELinux context to the Pods from from namespace annotations, i.e. they should run with the same SELinux context and be able to share a volume. (If not, we have a bug somewhere.) However, if the Pods are in different namespaces or their SCC has a different "SELinuxContext" value, then their SELinux contexts are most probably different and they can't share data on a volume.

      It's somewhat documented at https://docs.openshift.com/container-platform/4.12/authentication/managing-security-context-constraints.html

      To sum it, if "restricted" SCC is not enough for CNV, please use any other SCC that uses "SELinuxContext: type: MustRunAs" and all Pods in the same namespace will be able to share their volumes. There are other workarounds possible, but SCC would be the best.

      — Additional comment from Adam Litke on 2023-02-20 17:41:56 UTC —

      Stu, can you take a look at comment #23 from Jan regarding SELinux contexts? It seems that the migration destination Pod really should start with the same context as the source.

            sjhala@redhat.com Shikha Jhala
            ctomasko Catherine Tomasko
            Kedar Bidarkar Kedar Bidarkar
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: