Description of problem:
The cu in case 03496719 is reporting problems completing a backup through restic. Error received is:
time="2023-04-24T14:31:32Z" level=error msg="Error backing up item" backup=openshift-adp/backup-test01 error="pod volume backup failed: running Restic backup, stderr={\"message_type\":\"error\",\"error\":
{\"Op\":\"open\",\"Path\":\".java/fonts/11.0.18/fcinfo-1-jenkins-fd5fdf49f-csb9b-Linux-4.18.0-372.43.1.el8_6.x86_64-en.properties\",\"Err\":13},\"during\":\"archival\",\"item\":\"/host_pods/7d46648f-1b80-479a-b9bb-d05da9d21935/volumes/kubernetes.io~csi/pvc-2dfcb154-b20f-42a2-915d-7427741fc1c9/mount/.java/fonts/11.0.18/fcinfo-1-jenkins-fd5fdf49f-csb9b-Linux-4.18.0-372.43.1.el8_6.x86_64-en.properties\"}\n{\"message_type\":\"error\",\"error\":
{\"Op\":\"open\",\"Path\":\".java/fonts/11.0.18/fcinfo-1-jenkins-fd5fdf49f-wtkgb-Linux-4.18.0-372.43.1.el8_6.x86_64-en.properties\",\"Err\":13},\"during\":\"archival\",\"item\":\"/host_pods/7d46648f-1b80-479a-b9bb-d05da9d21935/volumes/kubernetes.io~csi/pvc-2dfcb154-b20f-42a2-915d-7427741fc1c9/mount/.java/fonts/11.0.18/fcinfo-1-jenkins-fd5fdf49f-wtkgb-Linux-4.18.0-372.43.1.el8_6.x86_64-en.properties\"}\n{\"message_type\":\"error\",\"error\":
{\"Op\":\"open\",\"Path\":\"identity.key.enc\",\"Err\":13},\"during\":\"archival\",\"item\":\"/host_pods/7d46648f-1b80-479a-b9bb-d05da9d21935/volumes/kubernetes.io~csi/pvc-2dfcb154-b20f-42a2-915d-
and observable state in restic backup is partially failed. This seems related to the issue discussed in KCS
https://access.redhat.com/solutions/6986857
and
https://access.redhat.com/solutions/6987288
but these were marked as resolved in 1.1+. Unsure if this is related and asking for assistance from oadp engineering to determine what data to collect and determine our next steps towards a resolution. oadp and cluster must-gathers are available through supportshell in the case.
Version-Release number of selected component (if applicable):
oadp 1.1.3
Actual results:
Restic backup fails, remaining in partially completed state without advancing.
Expected results:
Volume backup to complete as expected.
Additional info:
It was thought that perhaps this behavior was the result of some vestigial components of previous oadp versions, so a complete uninstallation of the operator was completed, and then reinstalled at the current version. Same behavior occurred.
CU is using AWS S3 for backing storage.
oadp namespace was retained during operator uninstall/reinstall
Tiger Notes observing must-gather
PV Definition
---
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/provisioned-by: csi.trident.netapp.io
volume.kubernetes.io/provisioner-deletion-secret-name: ""
volume.kubernetes.io/provisioner-deletion-secret-namespace: ""
finalizers:
- kubernetes.io/pv-protection
- external-attacher/csi-trident-netapp-io
name: pvc-2dfcb154-b20f-42a2-915d-7427741fc1c9
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 3Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: jenkins-pv-claim
namespace: devops-tools
resourceVersion: "1453757966"
uid: 2dfcb154-b20f-42a2-915d-7427741fc1c9
csi:
driver: csi.trident.netapp.io
volumeAttributes:
backendUUID: edabf770-8e67-4c1d-b647-803b5cbd2790
internalName: trident_pvc_2dfcb154_b20f_42a2_915d_7427741fc1c9
name: pvc-2dfcb154-b20f-42a2-915d-7427741fc1c9
protocol: file
storage.kubernetes.io/csiProvisionerIdentity: 1678969996682-8081-csi.trident.netapp.io
volumeHandle: pvc-2dfcb154-b20f-42a2-915d-7427741fc1c9
persistentVolumeReclaimPolicy: Delete
storageClassName: ontap-nas
volumeMode: Filesystem
status:
phase: Bound
Relevant upstream issues:
https://github.com/NetApp/trident/issues/561 https://github.com/NetApp/trident/issues/269#issuecomment-523884992 https://github.com/openshift/oadp-operator/issues/133 https://github.com/openshift/oadp-operator/issues/179 https://docs.openshift.com/container-platform/4.13/migration_toolkit_for_containers/troubleshooting-mtc.html#restic-permission-error-when-migrating-from-nfs-storage-with-root-squash-enabled_troubleshooting-mtc
- is related to
-
OADP-4677 DPA.spec.configuration.restic needs to be updated to .nodeAgent
- Closed
- links to