Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-2077

DataMover backup (cephFS) - volsync-src-vsb pod with "Error" status.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • OADP 1.2.6
    • OADP 1.2.0
    • data-mover
    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • No
    • 0
    • 0
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • Proposed

      Description of problem:

      Deploy OADP stage version, running datamover backup & restore.
      While running backup&restore using Ceph-RBD - tests ended with 'Completed' status.

      Trying to run a backup using CephFS - Part of the "volsync-src-vsb" pods were with error status.

      1. oc logs volsync-src-vsb-br9vl-rep-src-npgbp
        Starting container
        VolSync restic container version: ACM-0.7.1-8af0bf0
        backup
        restic 0.15.1 compiled with go1.19.6 on linux/amd64
        Testing mandatory env variables
        == Checking directory for content ===
        ls: cannot open directory '/data': Permission denied
      1. oc describe pod volsync-src-vsb-br9vl-rep-src-npgbp
        Containers:
          restic:
            Container ID:  cri-o://73b24708dd2137b9ac75906d4b314c24ee69ce0c0232cf4096434d474b9e26b8
            Image:         registry.redhat.io/rhacm2/volsync-rhel8@sha256:7207ea4de4a8bb3a2930b974c2122215cb902ab577e4ef1de6e635fd854b6d0a
            Image ID:      registry.redhat.io/rhacm2/volsync-rhel8@sha256:7207ea4de4a8bb3a2930b974c2122215cb902ab577e4ef1de6e635fd854b6d0a
            Port:          <none>
            Host Port:     <none>
            Command:
              /mover-restic/entry.sh
            Args:
              backup
            State:          Terminated
              Reason:       Error
              Exit Code:    2
              Started:      Thu, 08 Jun 2023 12:36:44 +0000
              Finished:     Thu, 08 Jun 2023 12:36:44 +0000
            Ready:          False
            Restart Count:  0
            Environment:
              FORGET_OPTIONS:                    --keep-last 1
              DATA_DIR:                          /data
              RESTIC_CACHE_DIR:                  /cache
              RESTORE_AS_OF:
              SELECT_PREVIOUS:                   0

          Mounts:
            /cache from cache (rw)
            /data from data (rw)
            /tmp from tempdir (rw)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sz4jb (ro)

      Events:
        Type     Reason                  Age   From                     Message
        ----     ------                  ----  ----                     -------
        Normal   Scheduled               2m3s  default-scheduler        Successfully assigned openshift-adp/volsync-src-vsb-br9vl-rep-src-npgbp to worker003-r640
        Warning  FileSystemResizeFailed  2m3s  kubelet                  MountVolume.NodeExpandVolume failed for volume "pvc-04c18c0d-b4cc-49dc-8868-59b83d27fac0" requested read-only file system
        Normal   SuccessfulAttachVolume  2m3s  attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-f0e153b3-0c42-4373-9922-e5823cb787c9"
        Normal   AddedInterface          115s  multus                   Add eth0 [10.128.5.186/23] from openshift-sdn
        Normal   Pulled                  115s  kubelet                  Container image "registry.redhat.io/rhacm2/volsync-rhel8@sha256:7207ea4de4a8bb3a2930b974c2122215cb902ab577e4ef1de6e635fd854b6d0a" already present on machine
        Normal   Created                 115s  kubelet                  Created container restic
        Normal   Started                 115s  kubelet                  Started container restic

      [root@f01-h07-000-r640 backup]# oc get pods

      NAME                                                READY   STATUS    RESTARTS   AGE
      openshift-adp-controller-manager-74bb4d7cd6-7xlgw   1/1     Running   0          29m
      velero-7ff77489bf-qkh9h                             1/1     Running   0          27m
      volsync-src-vsb-2xx72-rep-src-67z29                 0/1     Error     0          4m21s
      volsync-src-vsb-2xx72-rep-src-8kvj6                 0/1     Error     0          4m39s
      volsync-src-vsb-2xx72-rep-src-b2mp4                 0/1     Error     0          2m59s
      volsync-src-vsb-2xx72-rep-src-mqpc6                 0/1     Error     0          88s
      volsync-src-vsb-2xx72-rep-src-nk6ct                 0/1     Error     0          4m48s
      volsync-src-vsb-2xx72-rep-src-rpzlq                 0/1     Error     0          3m47s
      volsync-src-vsb-5m94b-rep-src-fxcvg                 0/1     Error     0          3m
      volsync-src-vsb-5m94b-rep-src-mnsvj                 0/1     Error     0          3m47s
      volsync-src-vsb-5m94b-rep-src-q4t22                 0/1     Error     0          4m48s
      volsync-src-vsb-5m94b-rep-src-rf9tk                 0/1     Error     0          4m39s

      Version-Release number of selected component (if applicable):

      OCP 4.1.2.9

      ODF 4.12.3
      OADP 1.2.0 Stage
      Ceph-FS

      How reproducible:

      Steps to Reproduce:
      1. Create NS with 100pods (6G PV size , 2G usage) - over CephFS
      2. Running datamover backup flow
      3. Monitor adp namespace pods
      4. Pods are in 'Error' status

      Actual results:

      Backup failed. Part of volsync-src-vsb pods are 'Error' status

      Expected results:

      Backup completed

      Additional info:

      ~50% of volsync-src-vsb pods were succeed.
      try with OADP 1.2.0 builds 78 & 79 - Backup with CephFS completed OK 

        1. stage-dm-backup-fs.tar
          170 kB
        2. stage-dm-backup-fs-100pods-2g.tar
          2.26 MB
        3. storagelogs.tar
          5.55 MB
        4. vsm.log.xz
          1.07 MB

            sseago Scott Seago
            dvaanunu@redhat.com David Vaanunu
            Votes:
            2 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: