Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-2173

kubevirt: Restore Partlly Failes when including a DV or a VMI (CSI/Datamover)

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Undefined Undefined
    • None
    • OADP 1.2.0
    • kubevirt
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • 0
    • 0
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • No

      Description of problem:

      Restore partially fails if not excluding vmi and the dv that created the VM disk (pvc).
      If excluding those resources, restore will finish successfully (csi / DM)

      The error:

        test-oadp-259:  error restoring datavolumes.cdi.kubevirt.io/test-oadp-259/test-vm-dv: admission webhook "datavolume-validate.cdi.kubevirt.io" denied the request:  Destination PVC test-oadp-259/test-vm-dv already exists                     error restoring virtualmachineinstances.kubevirt.io/test-oadp-259/test-vm: admission webhook "virtualmachineinstances-create-validator.kubevirt.io" denied the request: creation of the following reserved kubevirt.io/ labels on a VMI object is  

       

      AFAIU:  Once the vm resource is restored since it was in status running
      it is expected to launch another virt-launch  pod and create a VMI. If trying to restore the previous VMI after there is already one, it might make sense if it does succeed.
      Regarding the DV,
      Also, make sense the process will not be able to create another PV with the same name (The DV is only for downloaidng and creating the disk (pvc) for the VM, according to the VM manifest.

      I believe we should at least document this.

       

      Step to reproduce

      dpa

      apiVersion: oadp.openshift.io/v1alpha1
      kind: DataProtectionApplication
      metadata:
        creationTimestamp: "2023-06-29T13:00:14Z"
        generation: 4
        name: dpa
        namespace: openshift-adp
        resourceVersion: "24245542"
        uid: 4bf0205b-0f73-4099-b1d8-d921009b1e22
      spec:
        backupLocations:
        - velero:
            config:
              insecureSkipTLSVerify: "true"
              profile: default
              region: minio
              s3ForcePathStyle: "true"
              s3Url: http://10.0.188.30:9000
            credential:
              key: cloud
              name: cloud-credentials-minio
            default: true
            objectStorage:
              bucket: kubevirt
              prefix: velero
            provider: aws
        configuration:
          velero:
            defaultPlugins:
            - openshift
            - aws
            - csi
            - vsm
        features:
          dataMover:
            enable: true 

      create a Namespace and deploy the following VM

      apiVersion: kubevirt.io/v1
      kind: VirtualMachine
      metadata:
        name: test-vm
        namespace: test-kubvirt
      spec:
        dataVolumeTemplates:
          - metadata:
              annotations:
                cdi.kubevirt.io/storage.deleteAfterCompletion: 'false'
              name: test-vm-dv
            spec:
              pvc:
                accessModes:
                  - ReadWriteOnce
                resources:
                  requests:
                    storage: 5Gi
                storageClassName: ocs-storagecluster-ceph-rbd
              source:
                registry:
                  pullMethod: node
                  url: 'docker://quay.io/kubevirt/fedora-with-test-tooling-container-disk'
        running: true
        template:
          metadata:
            creationTimestamp: null
            name: test-vm
          spec:
            domain:
              devices:
                disks:
                  - disk:
                      bus: virtio
                    name: volume0
                  - disk:
                      bus: virtio
                    name: volume1
                interfaces:
                  - macAddress: '02:d9:bd:00:00:AA'
                    masquerade: {}
                    name: default
                rng: {}
              machine:
                type: q35
              resources:
                requests:
                  memory: 256M
            networks:
              - name: default
                pod: {}
            terminationGracePeriodSeconds: 0
            volumes:
              - dataVolume:
                  name: test-vm-dv
                name: volume0
              - cloudInitNoCloud:
                  networkData: |-
                    ethernets:
                      eth0:
                        addresses:
                        - fd10:0:2::2/120
                        dhcp4: true
                        gateway6: fd10:0:2::1
                        match: {}
                        nameservers:
                          addresses:
                          - 10.96.0.10
                          search:
                          - default.svc.cluster.local
                          - svc.cluster.local
                          - cluster.local
                    version: 2
                name: volume1 

      Note the DV that was created when the VM was deployed (to download the fedora image)
      and the vmi (because we set running: true)

      [amosmastbaum@fedora oadp-apps-deployer]$ oc get vmi
      NAME      AGE   PHASE     IP            NODENAME                     READY
      test-vm   78s   Running   10.131.0.63   mtv88-25qcc-worker-0-kjcx2   True
      [amosmastbaum@fedora oadp-apps-deployer]$ oc get dv
      NAME         PHASE       PROGRESS   RESTARTS   AGE
      test-vm-dv   Succeeded   100.0%                2m19s 

      wait for the VMI status AgentConnected

      $ oc wait vmi oadp-270-1 -n test-kubvirt --for=condition=AgentConnected --timeout=1m && echo OK!
      virtualmachineinstance.kubevirt.io/oadp-270-1 condition met
      OK! 

      run backup:

      apiVersion: velero.io/v1
      kind: Backup
      metadata:
        annotations:
          velero.io/source-cluster-k8s-gitversion: v1.25.10+8c21020
          velero.io/source-cluster-k8s-major-version: "1"
          velero.io/source-cluster-k8s-minor-version: "25"
        creationTimestamp: "2023-06-29T14:20:42Z"
        generation: 5
        labels:
          velero.io/storage-location: dpa-1
        name: backup-dm-3
        namespace: openshift-adp
        resourceVersion: "24249631"
        uid: d48b5529-5c9f-48d2-a0e4-8e749e198b7e
      spec:
        csiSnapshotTimeout: 10m0s
        defaultVolumesToFsBackup: false
        hooks: {}
        includedNamespaces:
        - test-oadp-259
        itemOperationTimeout: 1h0m0s
        metadata: {}
        storageLocation: dpa-1
        ttl: 720h0m0s 

      remove the NC

      oc delete ns/kubvirt 

      run restore 

      apiVersion: velero.io/v1
      kind: Restore
      metadata:
        creationTimestamp: "2023-06-29T14:26:01Z"
        generation: 9
        name: restore-dm
        namespace: openshift-adp
        resourceVersion: "24256509"
        uid: e223e167-4ef1-4335-9206-3412000a0401
      spec:
        backupName: backup-dm-3
        excludedResources:
        - nodes
        - events
        - events.events.k8s.io
        - backups.velero.io
        - restores.velero.io
        - resticrepositories.velero.io
        - csinodes.storage.k8s.io
        - volumeattachments.storage.k8s.io
        - backuprepositories.velero.io
        hooks: {}
        includedNamespaces:
        - '*'
        itemOperationTimeout: 1h0m0s 

      Restore partially fails

          test-oadp-259:  error restoring datavolumes.cdi.kubevirt.io/test-oadp-259/test-vm-dv: admission webhook "datavolume-validate.cdi.kubevirt.io" denied the request:  Destination PVC test-oadp-259/test-vm-dv already exists                     error restoring virtualmachineinstances.kubevirt.io/test-oadp-259/test-vm: admission webhook "virtualmachineinstances-create-validator.kubevirt.io" denied the request: creation of the following reserved kubevirt.io/ labels on a VMI object is  
      Name:         restore-dm
      Namespace:    openshift-adp
      Labels:       <none>
      Annotations:  <none>Phase:                       PartiallyFailed (run 'velero restore logs restore-dm' for more information)
      Total items to be restored:  48
      Items restored:              48Started:    2023-06-29 17:26:01 +0300 IDT
      Completed:  2023-06-29 17:27:56 +0300 IDTWarnings:
        Velero:     <none>
        Cluster:  could not restore, CustomResourceDefinition "clusterserviceversions.operators.coreos.com" already exists. Warning: the in-cluster version is different than the backed-up version.
                  could not restore, CustomResourceDefinition "datavolumes.cdi.kubevirt.io" already exists. Warning: the in-cluster version is different than the backed-up version.
                  could not restore, CustomResourceDefinition "virtualmachineinstances.kubevirt.io" already exists. Warning: the in-cluster version is different than the backed-up version.
                  could not restore, CustomResourceDefinition "virtualmachines.kubevirt.io" already exists. Warning: the in-cluster version is different than the backed-up version.
                  could not restore, ClusterRoleBinding "openshift-pipelines-clusterinterceptors" already exists. Warning: the in-cluster version is different than the backed-up version.
        Namespaces:
          test-oadp-259:  could not restore, ConfigMap "kube-root-ca.crt" already exists. Warning: the in-cluster version is different than the backed-up version.
                          could not restore, ConfigMap "openshift-service-ca.crt" already exists. Warning: the in-cluster version is different than the backed-up version.
                          could not restore, ClusterServiceVersion "openshift-pipelines-operator-rh.v1.11.0" already exists. Warning: the in-cluster version is different than the backed-up version.
                          could not restore, ClusterServiceVersion "volsync-product.v0.7.2" already exists. Warning: the in-cluster version is different than the backed-up version.
                          could not restore, RoleBinding "system:deployers" already exists. Warning: the in-cluster version is different than the backed-up version.
                          could not restore, RoleBinding "system:image-builders" already exists. Warning: the in-cluster version is different than the backed-up version.
                          could not restore, RoleBinding "system:image-pullers" already exists. Warning: the in-cluster version is different than the backed-up version.
                          could not restore, RoleBinding "admin" already exists. Warning: the in-cluster version is different than the backed-up version.
                          could not restore, RoleBinding "openshift-pipelines-edit" already exists. Warning: the in-cluster version is different than the backed-up version.
                          could not restore, RoleBinding "pipelines-scc-rolebinding" already exists. Warning: the in-cluster version is different than the backed-up version.
                          could not restore, RoleBinding "system:deployers" already exists. Warning: the in-cluster version is different than the backed-up version.
                          could not restore, RoleBinding "system:image-builders" already exists. Warning: the in-cluster version is different than the backed-up version.
                          could not restore, RoleBinding "system:image-pullers" already exists. Warning: the in-cluster version is different than the backed-up version.Errors:
        Velero:     <none>
        Cluster:    <none>
        Namespaces:
          test-oadp-259:  error restoring datavolumes.cdi.kubevirt.io/test-oadp-259/test-vm-dv: admission webhook "datavolume-validate.cdi.kubevirt.io" denied the request:  Destination PVC test-oadp-259/test-vm-dv already exists
                          error restoring virtualmachineinstances.kubevirt.io/test-oadp-259/test-vm: admission webhook "virtualmachineinstances-create-validator.kubevirt.io" denied the request: creation of the following reserved kubevirt.io/ labels on a VMI object is prohibitedBackup:  backup-dm-3Namespaces:
        Included:  all namespaces found in the backup
        Excluded:  <none>Resources:
        Included:        *
        Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io, csinodes.storage.k8s.io, volumeattachments.storage.k8s.io, backuprepositories.velero.io
        Cluster-scoped:  autoNamespace mappings:  <none>Label selector:  <none> 

      Version-Release number of selected component (if applicable):

       

      Expected results:

       

      Additional info:
      vm 

       apiVersion: kubevirt.io/v1
      kind: VirtualMachine
      metadata:
        annotations:
          kubemacpool.io/transaction-timestamp: "2023-06-29T15:08:43.056759174Z"
          kubevirt.io/latest-observed-api-version: v1
          kubevirt.io/storage-observed-api-version: v1alpha3
        creationTimestamp: "2023-06-29T15:08:42Z"
        generation: 1
        name: test-vm
        namespace: test-oadp-259
        resourceVersion: "24308537"
        uid: 26e611f1-9f5f-4cfc-bb7e-7487e93b670e
      spec:
        dataVolumeTemplates:
        - metadata:
            annotations:
              cdi.kubevirt.io/storage.deleteAfterCompletion: "false"
            creationTimestamp: null
            name: test-vm-dv
          spec:
            pvc:
              accessModes:
              - ReadWriteOnce
              resources:
                requests:
                  storage: 5Gi
            source:
              registry:
                pullMethod: node
                url: docker://quay.io/kubevirt/fedora-with-test-tooling-container-disk
        running: true
        template:
          metadata:
            creationTimestamp: null
            name: test-vm
          spec:
            domain:
              devices:
                disks:
                - disk:
                    bus: virtio
                  name: volume0
                - disk:
                    bus: virtio
                  name: volume1
                interfaces:
                - macAddress: "06:01:01:05:03:04"
                  masquerade: {}
                  name: default
                rng: {}
              machine:
                type: q35
              resources:
                requests:
                  memory: 256M
            networks:
            - name: default
              pod: {}
            terminationGracePeriodSeconds: 0
            volumes:
            - dataVolume:
                name: test-vm-dv
              name: volume0
            - cloudInitNoCloud:
                networkData: |-
                  ethernets:
                    eth0:
                      addresses:
                      - fd10:0:2::2/120
                      dhcp4: true
                      gateway6: fd10:0:2::1
                      match: {}
                      nameservers:
                        addresses:
                        - 10.96.0.10
                        search:
                        - default.svc.cluster.local
                        - svc.cluster.local
                        - cluster.local
                  version: 2
              name: volume1
      status:
        conditions:
        - lastProbeTime: null
          lastTransitionTime: "2023-06-29T15:09:49Z"
          status: "True"
          type: Ready
        - lastProbeTime: null
          lastTransitionTime: null
          message: 'cannot migrate VMI: PVC test-vm-dv is not shared, live migration requires
            that all PVCs must be shared (using ReadWriteMany access mode)'
          reason: DisksNotLiveMigratable
          status: "False"
          type: LiveMigratable
        - lastProbeTime: "2023-06-29T15:10:26Z"
          lastTransitionTime: null
          status: "True"
          type: AgentConnected
        created: true
        printableStatus: Running
        ready: true
        volumeSnapshotStatuses:
        - enabled: true
          name: volume0
        - enabled: false
          name: volume1
          reason: Snapshot is not supported for this volumeSource type [volume1]

      vim

      apiVersion: kubevirt.io/v1
      kind: VirtualMachineInstance
      metadata:
        annotations:
          kubevirt.io/latest-observed-api-version: v1
          kubevirt.io/storage-observed-api-version: v1alpha3
        creationTimestamp: "2023-06-29T15:09:39Z"
        finalizers:
        - kubevirt.io/virtualMachineControllerFinalize
        - foregroundDeleteVirtualMachine
        generation: 12
        labels:
          kubevirt.io/nodeName: mtv88-25qcc-worker-0-kjcx2
        name: test-vm
        namespace: test-oadp-259
        ownerReferences:
        - apiVersion: kubevirt.io/v1
          blockOwnerDeletion: true
          controller: true
          kind: VirtualMachine
          name: test-vm
          uid: 26e611f1-9f5f-4cfc-bb7e-7487e93b670e
        resourceVersion: "24308543"
        uid: 5715c4dd-f5a1-403e-8f1a-48f88b75223e
      spec:
        domain:
          cpu:
            cores: 1
            model: host-model
            sockets: 1
            threads: 1
          devices:
            disks:
            - disk:
                bus: virtio
              name: volume0
            - disk:
                bus: virtio
              name: volume1
            interfaces:
            - macAddress: "06:01:01:05:03:04"
              masquerade: {}
              name: default
            rng: {}
          features:
            acpi:
              enabled: true
          firmware:
            uuid: b39c846b-d6f8-5364-bd54-85b990b826b2
          machine:
            type: q35
          resources:
            requests:
              memory: 256M
        networks:
        - name: default
          pod: {}
        terminationGracePeriodSeconds: 0
        volumes:
        - dataVolume:
            name: test-vm-dv
          name: volume0
        - cloudInitNoCloud:
            networkData: |-
              ethernets:
                eth0:
                  addresses:
                  - fd10:0:2::2/120
                  dhcp4: true
                  gateway6: fd10:0:2::1
                  match: {}
                  nameservers:
                    addresses:
                    - 10.96.0.10
                    search:
                    - default.svc.cluster.local
                    - svc.cluster.local
                    - cluster.local
              version: 2
          name: volume1
      status:
        activePods:
          0c52c11e-9626-4624-9590-5421c63006a2: mtv88-25qcc-worker-0-kjcx2
        conditions:
        - lastProbeTime: null
          lastTransitionTime: "2023-06-29T15:09:49Z"
          status: "True"
          type: Ready
        - lastProbeTime: null
          lastTransitionTime: null
          message: 'cannot migrate VMI: PVC test-vm-dv is not shared, live migration requires
            that all PVCs must be shared (using ReadWriteMany access mode)'
          reason: DisksNotLiveMigratable
          status: "False"
          type: LiveMigratable
        - lastProbeTime: "2023-06-29T15:10:26Z"
          lastTransitionTime: null
          status: "True"
          type: AgentConnected
        guestOSInfo:
          id: fedora
          kernelRelease: 5.6.6-300.fc32.x86_64
          kernelVersion: '#1 SMP Tue Apr 21 13:44:19 UTC 2020'
          name: Fedora
          prettyName: Fedora 32 (Cloud Edition)
          version: "32"
          versionId: "32"
        interfaces:
        - infoSource: domain, guest-agent
          interfaceName: eth0
          ipAddress: 10.131.0.63
          ipAddresses:
          - 10.131.0.63
          mac: "06:01:01:05:03:04"
          name: default
          queueCount: 1
        launcherContainerImageVersion: registry.redhat.io/container-native-virtualization/virt-launcher@sha256:d0006af62737fa43b55458f08e0e393d916f3ad45586b6324a2ba0e2794ec00a
        migrationMethod: BlockMigration
        migrationTransport: Unix
        nodeName: mtv88-25qcc-worker-0-kjcx2
        phase: Running
        phaseTransitionTimestamps:
        - phase: Pending
          phaseTransitionTimestamp: "2023-06-29T15:09:39Z"
        - phase: Scheduling
          phaseTransitionTimestamp: "2023-06-29T15:09:39Z"
        - phase: Scheduled
          phaseTransitionTimestamp: "2023-06-29T15:09:49Z"
        - phase: Running
          phaseTransitionTimestamp: "2023-06-29T15:09:52Z"
        qosClass: Burstable
        runtimeUser: 107
        selinuxContext: system_u:object_r:container_file_t:s0:c43,c738
        virtualMachineRevisionName: revision-start-vm-26e611f1-9f5f-4cfc-bb7e-7487e93b670e-1
        volumeStatus:
        - name: volume0
          persistentVolumeClaimInfo:
            accessModes:
            - ReadWriteOnce
            capacity:
              storage: 5Gi
            filesystemOverhead: "0.055"
            requests:
              storage: 5Gi
            volumeMode: Filesystem
          target: vda
        - name: volume1
          size: 1048576
          target: vdb
      

      dv

      apiVersion: cdi.kubevirt.io/v1beta1
      kind: DataVolume
      metadata:
        annotations:
          cdi.kubevirt.io/storage.deleteAfterCompletion: "false"
        creationTimestamp: "2023-06-29T15:08:43Z"
        generation: 18
        labels:
          kubevirt.io/created-by: 26e611f1-9f5f-4cfc-bb7e-7487e93b670e
        name: test-vm-dv
        namespace: test-oadp-259
        ownerReferences:
        - apiVersion: kubevirt.io/v1
          blockOwnerDeletion: true
          controller: true
          kind: VirtualMachine
          name: test-vm
          uid: 26e611f1-9f5f-4cfc-bb7e-7487e93b670e
        resourceVersion: "24307567"
        uid: 424e857f-ebd9-49ae-94fb-69307342259a
      spec:
        pvc:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 5Gi
        source:
          registry:
            pullMethod: node
            url: docker://quay.io/kubevirt/fedora-with-test-tooling-container-disk
      status:
        claimName: test-vm-dv
        conditions:
        - lastHeartbeatTime: "2023-06-29T15:08:43Z"
          lastTransitionTime: "2023-06-29T15:08:43Z"
          message: PVC test-vm-dv Bound
          reason: Bound
          status: "True"
          type: Bound
        - lastHeartbeatTime: "2023-06-29T15:09:39Z"
          lastTransitionTime: "2023-06-29T15:09:39Z"
          status: "True"
          type: Ready
        - lastHeartbeatTime: "2023-06-29T15:09:08Z"
          lastTransitionTime: "2023-06-29T15:09:08Z"
          message: Import Complete
          reason: Completed
          status: "False"
          type: Running
        phase: Succeeded
        progress: 100.0%
      

              wnstb Wes Hayutin
              amastbau Amos Mastbaum
              Amos Mastbaum Amos Mastbaum
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: