Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-794

Second restore of CSI volume fails due to dataSource doesn't match dataSourceRef

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • oadp-velero-plugin-for-csi-container-1.1.1-16
    • ToDo
    • 0
    • 0
    • Very Likely
    • 0
    • Customer Facing
    • None
    • Unset
    • Unknown
    • No

      Description of problem:

      Using OADP 1.1 on OCP 4.11, I created a PVC that was provisioned using the ODF LVM CSI driver:

      $ oc get pvc toolbox-container-home -o yaml
      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        annotations:
          pv.kubernetes.io/bind-completed: "yes"
          pv.kubernetes.io/bound-by-controller: "yes"
          volume.beta.kubernetes.io/storage-provisioner: topolvm.cybozu.com
          volume.kubernetes.io/selected-node: master1
          volume.kubernetes.io/storage-provisioner: topolvm.cybozu.com
        creationTimestamp: "2022-09-15T15:36:24Z"
        finalizers:
        - kubernetes.io/pvc-protection
        name: toolbox-container-home
        namespace: test-volume
        resourceVersion: "10658302"
        uid: ac618f12-b6a5-46ad-ba96-cb5613b9dd7e
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1G
        storageClassName: odf-lvm-loop1-vg
        volumeMode: Filesystem
        volumeName: pvc-ac618f12-b6a5-46ad-ba96-cb5613b9dd7e
      status:
        accessModes:
        - ReadWriteOnce
        capacity:
          storage: 1Gi
        phase: Bound
      

      I used OADP to back up the PVC and restore it:

      $ velero backup create --from-schedule test-volume --wait
      $ velero create restore --from-backup test-volume-20220914003129 --wait
      

      After the restore operation completes, the PVC object looks like this:

      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        annotations:
          pv.kubernetes.io/bind-completed: "yes"
          pv.kubernetes.io/bound-by-controller: "yes"
          velero.io/backup-name: test-volume-20220914003129
          velero.io/volume-snapshot-name: velero-toolbox-container-home-vf52g
          volume.kubernetes.io/selected-node: master1
          volume.kubernetes.io/storage-provisioner: topolvm.cybozu.com
        creationTimestamp: "2022-09-15T01:43:47Z"
        finalizers:
        - kubernetes.io/pvc-protection
        labels:
          velero.io/backup-name: test-volume-20220914003129
          velero.io/restore-name: test-volume-20220914003129-20220914184324
          velero.io/volume-snapshot-name: velero-toolbox-container-home-vf52g
        name: toolbox-container-home
        namespace: test-volume
        resourceVersion: "10590150"
        uid: d29cdc38-5c09-4243-b10c-eed0083332d4
      spec:
        accessModes:
        - ReadWriteOnce
        dataSource:
          apiGroup: snapshot.storage.k8s.io
          kind: VolumeSnapshot
          name: velero-toolbox-container-home-vf52g
        dataSourceRef:
          apiGroup: snapshot.storage.k8s.io
          kind: VolumeSnapshot
          name: velero-toolbox-container-home-vf52g
        resources:
          requests:
            storage: 1G
        storageClassName: odf-lvm-loop1-vg
        volumeMode: Filesystem
        volumeName: pvc-d29cdc38-5c09-4243-b10c-eed0083332d4
      status:
        accessModes:
        - ReadWriteOnce
        capacity:
          storage: 1Gi
        phase: Bound
      

      Note that the dataSource and dataSourceRef fields in the above object both point to the VolumeSnapshot velero-toolbox-container-home-vf52g.

      Next, I create another backup of this PVC and try to restore it:

      $ velero backup create --from-schedule test-volume --wait
      $ velero create restore --from-backup test-volume-20220914004727 --wait
      

      This time the restore operation fails.

      $ velero restore describe test-volume-20220914004727-20220914151512
      ...
      Errors:
        Velero:     <none>
        Cluster:    <none>
        Namespaces:
          test-volume:  error restoring persistentvolumeclaims/test-volume/toolbox-container-home: PersistentVolumeClaim "toolbox-container-home" is invalid: spec: Invalid value: field.Path{name:"dataSource", index:"", parent:(*field.Path)(0xc07cac41e0)}: must match dataSourceRef
      ...
      

      During the PVC restore operation, Velero made the following API call that failed:

      {
        "kind": "Event",
        "apiVersion": "audit.k8s.io/v1",
        "level": "RequestResponse",
        "auditID": "037e9c67-7bf7-4169-8689-109124eda44c",
        "stage": "ResponseComplete",
        "requestURI": "/api/v1/namespaces/test-volume/persistentvolumeclaims",
        "verb": "create",
        "user": {
          "username": "system:serviceaccount:openshift-adp:velero",
          "uid": "1fd41968-4739-4e85-9b55-896272960be7",
          "groups": [
            "system:serviceaccounts",
            "system:serviceaccounts:openshift-adp",
            "system:authenticated"
          ],
          "extra": {
            "authentication.kubernetes.io/pod-name": [
              "velero-6898bff594-gjlrm"
            ],
            "authentication.kubernetes.io/pod-uid": [
              "4cd639aa-4b96-4992-b8f3-c8ff0f16ffbb"
            ]
          }
        },
        "sourceIPs": [
          "10.128.1.11"
        ],
        "userAgent": "velero-server/v1.9.0-OADP (linux/amd64) -",
        "objectRef": {
          "resource": "persistentvolumeclaims",
          "namespace": "test-volume",
          "name": "toolbox-container-home",
          "apiVersion": "v1"
        },
        "responseStatus": {
          "metadata": {},
          "status": "Failure",
          "message": "PersistentVolumeClaim \"toolbox-container-home\" is invalid: spec: Invalid value: field.Path{name:\"dataSource\", index:\"\", parent:(*field.Path)(0xc037964390)}: must match dataSourceRef",
          "reason": "Invalid",
          "details": {
            "name": "toolbox-container-home",
            "kind": "PersistentVolumeClaim",
            "causes": [
              {
                "reason": "FieldValueInvalid",
                "message": "Invalid value: field.Path{name:\"dataSource\", index:\"\", parent:(*field.Path)(0xc037964390)}: must match dataSourceRef",
                "field": "spec"
              }
            ]
          },
          "code": 422
        },
        "requestObject": {
          "kind": "PersistentVolumeClaim",
          "apiVersion": "v1",
          "metadata": {
            "name": "toolbox-container-home",
            "namespace": "test-volume",
            "creationTimestamp": null,
            "labels": {
              "velero.io/backup-name": "test-volume-20220914004727",
              "velero.io/restore-name": "test-volume-20220914004727-20220914174140",
              "velero.io/volume-snapshot-name": "velero-toolbox-container-home-42d9p"
            },
            "annotations": {
              "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"kind\":\"PersistentVolumeClaim\",\"metadata\":{\"annotations\":{},\"name\":\"toolbox-container-home\",\"namespace\":\"test-volume\"},\"spec\":{\"accessModes\":[\"ReadWriteOnce\"],\"resources\":{\"requests\":{\"storage\":\"1G\"}}}}\n",
              "velero.io/backup-name": "test-volume-20220914004727",
              "velero.io/volume-snapshot-name": "velero-toolbox-container-home-42d9p",
              "volume.kubernetes.io/storage-provisioner": "topolvm.cybozu.com"
            }
          },
          "spec": {
            "accessModes": [
              "ReadWriteOnce"
            ],
            "resources": {
              "requests": {
                "storage": "1Gi"
              }
            },
            "storageClassName": "odf-lvm-loop1-vg",
            "volumeMode": "Filesystem",
            "dataSource": {
              "apiGroup": "snapshot.storage.k8s.io",
              "kind": "VolumeSnapshot",
              "name": "velero-toolbox-container-home-42d9p"
            },
            "dataSourceRef": {
              "apiGroup": "snapshot.storage.k8s.io",
              "kind": "VolumeSnapshot",
              "name": "velero-toolbox-container-home-vf52g"
            }
          },
          "status": {
            "phase": "Pending"
          }
        },
        "responseObject": {
          "kind": "Status",
          "apiVersion": "v1",
          "metadata": {},
          "status": "Failure",
          "message": "PersistentVolumeClaim \"toolbox-container-home\" is invalid: spec: Invalid value: field.Path{name:\"dataSource\", index:\"\", parent:(*field.Path)(0xc037964390)}: must match dataSourceRef",
          "reason": "Invalid",
          "details": {
            "name": "toolbox-container-home",
            "kind": "PersistentVolumeClaim",
            "causes": [
              {
                "reason": "FieldValueInvalid",
                "message": "Invalid value: field.Path{name:\"dataSource\", index:\"\", parent:(*field.Path)(0xc037964390)}: must match dataSourceRef",
                "field": "spec"
              }
            ]
          },
          "code": 422
        },
        "requestReceivedTimestamp": "2022-09-15T00:42:03.895874Z",
        "stageTimestamp": "2022-09-15T00:42:03.918537Z",
        "annotations": {
          "authorization.k8s.io/decision": "allow",
          "authorization.k8s.io/reason": "RBAC: allowed by ClusterRoleBinding \"oadp-operator.v1.1.0-7668f46dbc\" of ClusterRole \"oadp-operator.v1.1.0-7668f46dbc\" to ServiceAccount \"velero/openshift-adp\""
        }
      }
      

      The problem is probably in the Velero CSI plugin, in the resetPVCSpec method here

      This method resets only the dataSource field but leaves the dataSourceRef unchanged. The method should probably reset both the dataSource and dataSourceRef fields.

              spampatt@redhat.com Shubham Pampattiwar
              anosek@redhat.com Ales Nosek
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: