Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-2504

DataMover: Application pods are failing due to permission issue

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Blocker Blocker
    • None
    • OADP 1.1.6
    • volsync
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • 0
    • 0
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • Yes

      Description of problem:

      After successful dataMover restore, Pod(s) enters into crashloopbackoff state in OCP 4.14.

      Version-Release number of selected component (if applicable):
      OCP 4.14
      oadp-operator-bundle-container-1.1.6-8
      volsync-product.v0.7.4   VolSync         0.7.4     volsync-product.v0.7.3   Succeeded

       

      How reproducible:

      Always

       

      Steps to Reproduce:
      1. Create a DPA with dataMover enabled

      apiVersion: oadp.openshift.io/v1alpha1
      kind: DataProtectionApplication
      metadata:
        name: ts-dpa
      spec:
        backupLocations:
        - velero:
            default: true
            objectStorage:
              bucket: oadpbucket227925
              prefix: velero
            provider: gcp
        configuration:
          velero:
            defaultPlugins:
            - gcp
            - openshift
            - csi
        features:
          dataMover:
            enable: true

      2. Deploy ocp-django application
      3. Execute backup with DataMover

      $ oc get backup test-backup2 -o yaml
      apiVersion: velero.io/v1
      kind: Backup
      metadata:
        annotations:
          velero.io/source-cluster-k8s-gitversion: v1.27.4+deb2c60
          velero.io/source-cluster-k8s-major-version: "1"
          velero.io/source-cluster-k8s-minor-version: "27"
        creationTimestamp: "2023-08-24T11:46:03Z"
        generation: 7
        labels:
          velero.io/storage-location: ts-dpa-1
        name: test-backup2
        namespace: openshift-adp
        resourceVersion: "226550"
        uid: 198f2998-03e8-4ad1-bc19-c1108d7f4cce
      spec:
        csiSnapshotTimeout: 10m0s
        defaultVolumesToRestic: false
        includedNamespaces:
        - test3
        storageLocation: ts-dpa-1
        ttl: 720h0m0s
      status:
        completionTimestamp: "2023-08-24T11:50:11Z"
        csiVolumeSnapshotsAttempted: 1
        csiVolumeSnapshotsCompleted: 1
        expiration: "2023-09-23T11:46:03Z"
        formatVersion: 1.1.0
        phase: Completed
        progress:
          itemsBackedUp: 98
          totalItems: 98
        startTimestamp: "2023-08-24T11:46:03Z"
        version: 1

      4. Delete app namespace

      5. Execute Restore

      $ oc get restore test-restore4 -o yaml
      apiVersion: velero.io/v1
      kind: Restore
      metadata:
        creationTimestamp: "2023-08-24T11:52:56Z"
        generation: 10
        managedFields:
        - apiVersion: velero.io/v1
          fieldsType: FieldsV1
          fieldsV1:
            f:spec:
              .: {}
              f:backupName: {}
          manager: kubectl-create
          operation: Update
          time: "2023-08-24T11:52:56Z"
        - apiVersion: velero.io/v1
          fieldsType: FieldsV1
          fieldsV1:
            f:spec:
              f:excludedResources: {}
            f:status:
              .: {}
              f:completionTimestamp: {}
              f:phase: {}
              f:progress:
                .: {}
                f:itemsRestored: {}
                f:totalItems: {}
              f:startTimestamp: {}
              f:warnings: {}
          manager: velero-server
          operation: Update
          time: "2023-08-24T11:54:13Z"
        name: test-restore4
        namespace: openshift-adp
        resourceVersion: "229366"
        uid: 66dbc725-0c3d-4274-a8c8-3b2d0f4480db
      spec:
        backupName: test-backup2
        excludedResources:
        - nodes
        - events
        - events.events.k8s.io
        - backups.velero.io
        - restores.velero.io
        - resticrepositories.velero.io
        - csinodes.storage.k8s.io
        - volumeattachments.storage.k8s.io
      status:
        completionTimestamp: "2023-08-24T11:54:13Z"
        phase: Completed
        progress:
          itemsRestored: 52
          totalItems: 52
        startTimestamp: "2023-08-24T11:52:56Z"
        warnings: 4

      6. Check application pods after restore

      $ oc get pod -n test3 
      NAME                              READY   STATUS             RESTARTS     AGE
      django-psql-persistent-1-build    1/1     Running            0            41s
      django-psql-persistent-1-deploy   1/1     Running            0            41s
      django-psql-persistent-1-mvbnk    0/1     Running            0            38s
      postgresql-1-deploy               1/1     Running            0            43s
      postgresql-1-gbvs8                0/1     CrashLoopBackOff   1 (9s ago)   40s

      Actual results:

      Application pod is going into the CrashLoopBackOff. 

      $ oc logs -ntest3 postgresql-1-gbvs8
      chmod: changing permissions of '/var/lib/pgsql/data/userdata': Operation not permitted

      Expected results:
      Application shouldn't go into the crashloopbackoff.

       

      Additional info:

      Permissions before taking backup

      $ oc get pod -n ocp-django
      NAME                              READY   STATUS      RESTARTS   AGE
      django-psql-persistent-1-build    0/1     Completed   0          7m14s
      django-psql-persistent-1-deploy   0/1     Completed   0          6m18s
      django-psql-persistent-1-r8xwh    1/1     Running     0          6m15s
      postgresql-1-deploy               0/1     Completed   0          7m12s
      postgresql-1-rmwzp                1/1     Running     0          7m9s
      [prajoshi@localhost OADP]$ oc rsh -n ocp-django postgresql-1-rmwzp
      sh-5.1$ df -hT
      Filesystem     Type     Size  Used Avail Use% Mounted on
      overlay        overlay  128G   13G  115G  10% /
      tmpfs          tmpfs     64M     0   64M   0% /dev
      shm            tmpfs     64M   16K   64M   1% /dev/shm
      tmpfs          tmpfs    3.2G   62M  3.1G   2% /etc/passwd
      /dev/sda4      xfs      128G   13G  115G  10% /etc/hosts
      /dev/sdb       ext4     974M   49M  910M   6% /var/lib/pgsql/data
      tmpfs          tmpfs    512M   20K  512M   1% /run/secrets/kubernetes.io/serviceaccount
      tmpfs          tmpfs    7.9G     0  7.9G   0% /proc/acpi
      tmpfs          tmpfs    7.9G     0  7.9G   0% /proc/scsi
      tmpfs          tmpfs    7.9G     0  7.9G   0% /sys/firmware
      sh-5.1$ ls -lh /var/lib/pgsql/data
      total 20K
      drwxrws---.  2 root       1000820000  16K Aug 24 10:18 lost+found
      drwx------. 20 1000820000 1000820000 4.0K Aug 24 10:18 userdata 

      Permissions after restore.

      $ ls -lh /var/lib/pgsql/data
      total 20K
      drwxrws---.  2 root       1000840000  16K Aug 24 12:17 lost+found
      drwxrws---. 20 1000830000 1000840000 4.0K Aug 24 11:05 userdata 

              wnstb Wes Hayutin
              rhn-support-prajoshi Prasad Joshi
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: