Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-621

Failed to restore more than 1000 pods in single namespace on OCP 4.11

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • No
    • 0
    • 0
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown

      Description of problem:

      Running restore of:

      1. 1000 pods using a single namespace ==> 948 pods were restore
      2. 1500 pods using a single namespace ==> 677 pods were restore

      Version-Release number of selected component (if applicable):

      Scale-Env: cloud33 (3 masters & 12 workers)

      OCP: 4.11.0-rc.0
      OADP: 1.1.0-30 (iib)276825

      How reproducible:

       

      Steps to Reproduce:
      1. run backup
      2. delete the backup namespace
      3. run restore

      Actual results:

      Failed to restore all pods.

      Velero log:
      time="2022-07-20T11:29:26Z" level=error msg="Namespace busybox-perf-single-ns-1500-pods, resource restore error: error preparing pods/busybox-perf-single-ns-1500-pods/busybox-perf-single-ns-1500-pods-997: rpc error: code = Unavailable desc = error reading from server: EOF" logSource="pkg/controller/restore_controller.go:504" restore=openshift-adp/restore-1500pods

      Expected results:

      restore all pods 

      Additional info:

      Storage: OCS rbd

      backup.yaml
      apiVersion: velero.io/v1
      kind: Backup
      metadata:
        name: backup-1500pods
        labels:
          velero.io/storage-location: default
        namespace: openshift-adp
      spec:
        hooks: {}
        includedNamespaces:
        - busybox-perf-single-ns-1500-pods
        storageLocation: velero-1
        ttl: 720h0m0s

      restore.yaml
      apiVersion: velero.io/v1
      kind: Restore
      metadata:
       name: restore-1500pods
       namespace: openshift-adp
      spec:
       backupName: backup-1500pods
       excludedResources:
       - nodes
       - events
       - events.events.k8s.io
       - backups.velero.io
       - restores.velero.io
       - resticrepositories.velero.io
       restorePVs: true

      dpa.yaml
      apiVersion: oadp.openshift.io/v1alpha1
      kind: DataProtectionApplication
      metadata:
        name: velero
        namespace: openshift-adp
      spec:
        configuration:
          velero:
            defaultPlugins:
              - aws
              - openshift
              - csi
          restic:
            enable: false
          featureFlags:
          - EnableCSI
        backupLocations:
          - name: default
            velero:
              config:
                profile: noobaa
                region: noobaa
                s3Url: https://s3-openshift-storage.apps.oadp348.cloud33mpqe.com
                insecureSkipTLSVerify: "true"
                s3ForcePathStyle: "true"
              provider: aws
              default: true
              credential:
                key: cloud
                name: cloud-credentials
              objectStorage:
                bucket: cloud33-oadp-bucket
                prefix: velero

        1. controller-1500pods_3.log.gz
          303 kB
        2. events_yaml.log.gz
          147 kB
        3. restore-1500pods.png
          restore-1500pods.png
          73 kB
        4. velero-1500pods_3.log.gz
          4.13 MB

            sseago Scott Seago
            dvaanunu@redhat.com David Vaanunu
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: