Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-1959

Restic backup is failing due to NetApp enabled permission squashing

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • QE - Ack
    • ToDo
    • No
    • 0
    • 0
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown

      Description of problem:

      The cu in case 03496719 is reporting problems completing a backup through restic. Error received is:

      time="2023-04-24T14:31:32Z" level=error msg="Error backing up item" backup=openshift-adp/backup-test01 error="pod volume backup failed: running Restic backup, stderr={\"message_type\":\"error\",\"error\":

      {\"Op\":\"open\",\"Path\":\".java/fonts/11.0.18/fcinfo-1-jenkins-fd5fdf49f-csb9b-Linux-4.18.0-372.43.1.el8_6.x86_64-en.properties\",\"Err\":13}

      ,\"during\":\"archival\",\"item\":\"/host_pods/7d46648f-1b80-479a-b9bb-d05da9d21935/volumes/kubernetes.io~csi/pvc-2dfcb154-b20f-42a2-915d-7427741fc1c9/mount/.java/fonts/11.0.18/fcinfo-1-jenkins-fd5fdf49f-csb9b-Linux-4.18.0-372.43.1.el8_6.x86_64-en.properties\"}\n{\"message_type\":\"error\",\"error\":

      {\"Op\":\"open\",\"Path\":\".java/fonts/11.0.18/fcinfo-1-jenkins-fd5fdf49f-wtkgb-Linux-4.18.0-372.43.1.el8_6.x86_64-en.properties\",\"Err\":13}

      ,\"during\":\"archival\",\"item\":\"/host_pods/7d46648f-1b80-479a-b9bb-d05da9d21935/volumes/kubernetes.io~csi/pvc-2dfcb154-b20f-42a2-915d-7427741fc1c9/mount/.java/fonts/11.0.18/fcinfo-1-jenkins-fd5fdf49f-wtkgb-Linux-4.18.0-372.43.1.el8_6.x86_64-en.properties\"}\n{\"message_type\":\"error\",\"error\":

      {\"Op\":\"open\",\"Path\":\"identity.key.enc\",\"Err\":13}

      ,\"during\":\"archival\",\"item\":\"/host_pods/7d46648f-1b80-479a-b9bb-d05da9d21935/volumes/kubernetes.io~csi/pvc-2dfcb154-b20f-42a2-915d-
      and observable state in restic backup is partially failed. This seems related to the issue discussed in KCS 

      https://access.redhat.com/solutions/6986857

      and

      https://access.redhat.com/solutions/6987288

       

      but these were marked as resolved in 1.1+. Unsure if this is related and asking for assistance from oadp engineering to determine what data to collect and determine our next steps towards a resolution. oadp and cluster must-gathers are available through supportshell in the case.

      Version-Release number of selected component (if applicable):

      oadp 1.1.3

       

      Actual results:

      Restic backup fails, remaining in partially completed state without advancing.

      Expected results:

      Volume backup to complete as expected.

      Additional info:

      It was thought that perhaps this behavior was the result of some vestigial components of previous oadp versions, so a complete uninstallation of the operator was completed, and then reinstalled at the current version. Same behavior occurred. 

      CU is using AWS S3 for backing storage.

      oadp namespace was retained during operator uninstall/reinstall

       

      Tiger Notes observing must-gather
      PV Definition

      ---
      apiVersion: v1
      kind: PersistentVolume
      metadata:
        annotations:
          pv.kubernetes.io/provisioned-by: csi.trident.netapp.io
          volume.kubernetes.io/provisioner-deletion-secret-name: ""
          volume.kubernetes.io/provisioner-deletion-secret-namespace: ""
        finalizers:
        - kubernetes.io/pv-protection
        - external-attacher/csi-trident-netapp-io
        name: pvc-2dfcb154-b20f-42a2-915d-7427741fc1c9
      spec:
        accessModes:
        - ReadWriteOnce
        capacity:
          storage: 3Gi
        claimRef:
          apiVersion: v1
          kind: PersistentVolumeClaim
          name: jenkins-pv-claim
          namespace: devops-tools
          resourceVersion: "1453757966"
          uid: 2dfcb154-b20f-42a2-915d-7427741fc1c9
        csi:
          driver: csi.trident.netapp.io
          volumeAttributes:
            backendUUID: edabf770-8e67-4c1d-b647-803b5cbd2790
            internalName: trident_pvc_2dfcb154_b20f_42a2_915d_7427741fc1c9
            name: pvc-2dfcb154-b20f-42a2-915d-7427741fc1c9
            protocol: file
            storage.kubernetes.io/csiProvisionerIdentity: 1678969996682-8081-csi.trident.netapp.io
          volumeHandle: pvc-2dfcb154-b20f-42a2-915d-7427741fc1c9
        persistentVolumeReclaimPolicy: Delete
        storageClassName: ontap-nas
        volumeMode: Filesystem
      status:
        phase: Bound
      

      Relevant upstream issues:
      https://github.com/NetApp/trident/issues/561 https://github.com/NetApp/trident/issues/269#issuecomment-523884992 https://github.com/openshift/oadp-operator/issues/133 https://github.com/openshift/oadp-operator/issues/179 https://docs.openshift.com/container-platform/4.13/migration_toolkit_for_containers/troubleshooting-mtc.html#restic-permission-error-when-migrating-from-nfs-storage-with-root-squash-enabled_troubleshooting-mtc

            tkaovila@redhat.com Tiger Kaovilai
            braander@redhat.com Brandon Anderson
            Amos Mastbaum Amos Mastbaum
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: