Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Blocker
Fix Version/s: None
Affects Version/s: OADP 1.1.6
Component/s: volsync
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
QEStatus:
ToDo
Intelligence Requested:
Market:

Cost of Delay:
0
WSJF:
0
Risk Probability:
Very Likely
Risk Score:
0

Workstream:

None

Root Cause:
Unset
Failure Category:
Unknown

Regression:
Yes

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Description of problem:

After successful dataMover restore, Pod(s) enters into crashloopbackoff state in OCP 4.14.

Version-Release number of selected component (if applicable):
OCP 4.14
oadp-operator-bundle-container-1.1.6-8
volsync-product.v0.7.4 VolSync 0.7.4 volsync-product.v0.7.3 Succeeded

How reproducible:

Always

Steps to Reproduce:
1. Create a DPA with dataMover enabled

apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
  name: ts-dpa
spec:
  backupLocations:
  - velero:
      default: true
      objectStorage:
        bucket: oadpbucket227925
        prefix: velero
      provider: gcp
  configuration:
    velero:
      defaultPlugins:
      - gcp
      - openshift
      - csi
  features:
    dataMover:
      enable: true

2. Deploy ocp-django application
3. Execute backup with DataMover

$ oc get backup test-backup2 -o yaml
apiVersion: velero.io/v1
kind: Backup
metadata:
  annotations:
    velero.io/source-cluster-k8s-gitversion: v1.27.4+deb2c60
    velero.io/source-cluster-k8s-major-version: "1"
    velero.io/source-cluster-k8s-minor-version: "27"
  creationTimestamp: "2023-08-24T11:46:03Z"
  generation: 7
  labels:
    velero.io/storage-location: ts-dpa-1
  name: test-backup2
  namespace: openshift-adp
  resourceVersion: "226550"
  uid: 198f2998-03e8-4ad1-bc19-c1108d7f4cce
spec:
  csiSnapshotTimeout: 10m0s
  defaultVolumesToRestic: false
  includedNamespaces:
  - test3
  storageLocation: ts-dpa-1
  ttl: 720h0m0s
status:
  completionTimestamp: "2023-08-24T11:50:11Z"
  csiVolumeSnapshotsAttempted: 1
  csiVolumeSnapshotsCompleted: 1
  expiration: "2023-09-23T11:46:03Z"
  formatVersion: 1.1.0
  phase: Completed
  progress:
    itemsBackedUp: 98
    totalItems: 98
  startTimestamp: "2023-08-24T11:46:03Z"
  version: 1

4. Delete app namespace

5. Execute Restore

$ oc get restore test-restore4 -o yaml
apiVersion: velero.io/v1
kind: Restore
metadata:
  creationTimestamp: "2023-08-24T11:52:56Z"
  generation: 10
  managedFields:
  - apiVersion: velero.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        .: {}
        f:backupName: {}
    manager: kubectl-create
    operation: Update
    time: "2023-08-24T11:52:56Z"
  - apiVersion: velero.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        f:excludedResources: {}
      f:status:
        .: {}
        f:completionTimestamp: {}
        f:phase: {}
        f:progress:
          .: {}
          f:itemsRestored: {}
          f:totalItems: {}
        f:startTimestamp: {}
        f:warnings: {}
    manager: velero-server
    operation: Update
    time: "2023-08-24T11:54:13Z"
  name: test-restore4
  namespace: openshift-adp
  resourceVersion: "229366"
  uid: 66dbc725-0c3d-4274-a8c8-3b2d0f4480db
spec:
  backupName: test-backup2
  excludedResources:
  - nodes
  - events
  - events.events.k8s.io
  - backups.velero.io
  - restores.velero.io
  - resticrepositories.velero.io
  - csinodes.storage.k8s.io
  - volumeattachments.storage.k8s.io
status:
  completionTimestamp: "2023-08-24T11:54:13Z"
  phase: Completed
  progress:
    itemsRestored: 52
    totalItems: 52
  startTimestamp: "2023-08-24T11:52:56Z"
  warnings: 4

6. Check application pods after restore

$ oc get pod -n test3 
NAME                              READY   STATUS             RESTARTS     AGE
django-psql-persistent-1-build    1/1     Running            0            41s
django-psql-persistent-1-deploy   1/1     Running            0            41s
django-psql-persistent-1-mvbnk    0/1     Running            0            38s
postgresql-1-deploy               1/1     Running            0            43s
postgresql-1-gbvs8                0/1     CrashLoopBackOff   1 (9s ago)   40s

Actual results:

Application pod is going into the CrashLoopBackOff.

$ oc logs -ntest3 postgresql-1-gbvs8
chmod: changing permissions of '/var/lib/pgsql/data/userdata': Operation not permitted

Expected results:
Application shouldn't go into the crashloopbackoff.

Additional info:

Permissions before taking backup

$ oc get pod -n ocp-django
NAME                              READY   STATUS      RESTARTS   AGE
django-psql-persistent-1-build    0/1     Completed   0          7m14s
django-psql-persistent-1-deploy   0/1     Completed   0          6m18s
django-psql-persistent-1-r8xwh    1/1     Running     0          6m15s
postgresql-1-deploy               0/1     Completed   0          7m12s
postgresql-1-rmwzp                1/1     Running     0          7m9s
[prajoshi@localhost OADP]$ oc rsh -n ocp-django postgresql-1-rmwzp
sh-5.1$ df -hT
Filesystem     Type     Size  Used Avail Use% Mounted on
overlay        overlay  128G   13G  115G  10% /
tmpfs          tmpfs     64M     0   64M   0% /dev
shm            tmpfs     64M   16K   64M   1% /dev/shm
tmpfs          tmpfs    3.2G   62M  3.1G   2% /etc/passwd
/dev/sda4      xfs      128G   13G  115G  10% /etc/hosts
/dev/sdb       ext4     974M   49M  910M   6% /var/lib/pgsql/data
tmpfs          tmpfs    512M   20K  512M   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs          tmpfs    7.9G     0  7.9G   0% /proc/acpi
tmpfs          tmpfs    7.9G     0  7.9G   0% /proc/scsi
tmpfs          tmpfs    7.9G     0  7.9G   0% /sys/firmware
sh-5.1$ ls -lh /var/lib/pgsql/data
total 20K
drwxrws---.  2 root       1000820000  16K Aug 24 10:18 lost+found
drwx------. 20 1000820000 1000820000 4.0K Aug 24 10:18 userdata

Permissions after restore.

$ ls -lh /var/lib/pgsql/data
total 20K
drwxrws---.  2 root       1000840000  16K Aug 24 12:17 lost+found
drwxrws---. 20 1000830000 1000840000 4.0K Aug 24 11:05 userdata

Assignee:: Wes Hayutin

Reporter:: Prasad Joshi

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2023/08/24 12:36 PM

Updated:: 2023/08/24 2:12 PM

Resolved:: 2023/08/24 1:43 PM

Details

Description

Description of problem:

Actual results:

Application pod is going into the CrashLoopBackOff.

Attachments

Easy Agile Planning Poker

Activity

People

Dates