Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-1224

Backup Using "Restic" Failed on single ns with 5k pods exit on : " daemonset pod not found in running state in node worker"

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • OADP 1.2.5
    • OADP 1.2.0
    • restic
    • True
    • Hide

      None

      Show
      None
    • False
    • QE - Ack
    • ToDo
    • Yes
    • 0
    • 0
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown

      in order to check  https://issues.redhat.com/browse/OADP-1152
      with the following workaround 

      Try the workaround mentioned here https://github.com/vmware-tanzu/velero/issues/4421#issuecomment-1035998901
      `oc get resticrepository -n openshift-adp` see which one is relevant then
      `oc delete resticrepository <repository name> -n openshift-adp`
      This workaround is needed whenever s3 bucket is cleared manually and velero doesn't automatically recreate its expected redirectories.

      Backup using "restic"  failed failed again with different error ,on single namespace with 5000 pods pvc size 32MB
      on "ocs-storagecluster-ceph-rbd"
      adp version:  iiib:412677  - v1.2.0 ,   velero   v1.9.2-OADP

      the error :
      msg="Error backing up item" backup=openshift-adp/adp-1152 error="daemonset pod not found in running state in node worker002-r650"
       

       the error from the velero backup logs 

       time="2023-01-18T14:48:02Z" level=error msg="Error backing up item" backup=openshift-adp/adp-1152 error="daemonset pod not found in running state in node worker002-r650" error.file="/remote-source/velero/app/pkg/nodeagent/node_agent.go:74" error.function=github.com/vmware-tanzu/velero/pkg/nodeagent.IsRunningInNode logSource="/remote-source/velero/app/pkg/backup/backup.go:425" name=busybox-perf-single-ns-5000-pods-993
      time="2023-01-18T14:48:12Z" level=info msg="1 errors encountered backup up item" backup=openshift-adp/adp-1152 logSource="/remote-source/velero/app/pkg/backup/backup.go:421" name=busybox-perf-single-ns-5000-pods-996
      time="2023-01-18T14:48:12Z" level=error msg="Error backing up item" backup=openshift-adp/adp-1152 error="daemonset pod not found in running state in node worker002-r650" error.file="/remote-source/velero/app/pkg/nodeagent/node_agent.go:74" error.function=github.com/vmware-tanzu/velero/pkg/nodeagent.IsRunningInNode logSource="/remote-source/velero/app/pkg/backup/backup.go:425" name=busybox-perf-single-ns-5000-pods-996

       

      from the above log regarding pod : > busybox-perf-single-ns-5000-pods-996
      the pod is on running state 

      [root@f07-h27-000-r640 ~]# oc get pod/busybox-perf-single-ns-5000-pods-996 -nbusybox-perf-single-ns-5000-pods -owide
      NAME                                   READY   STATUS    RESTARTS   AGE   IP             NODE             NOMINATED NODE   READINESS GATES
      busybox-perf-single-ns-5000-pods-996   1/1     Running   0          14d   10.130.3.120   worker002-r650   <none>           <none>

      velero backup describe adp-1152 

      Name:         adp-1152
      Namespace:    openshift-adp
      Labels:       velero.io/storage-location=example-velero-1
      Annotations:  velero.io/source-cluster-k8s-gitversion=v1.24.0+3882f8f
                    velero.io/source-cluster-k8s-major-version=1
                    velero.io/source-cluster-k8s-minor-version=24
      Phase:  PartiallyFailed (run `velero backup logs adp-1152` for more information)Errors:    443
      Warnings:  0Namespaces:
        Included:  busybox-perf-single-ns-5000-pods
        Excluded:  <none>Resources:
        Included:        *
        Excluded:        <none>
        Cluster-scoped:  autoLabel selector:  <none>Storage Location:  example-velero-1Velero-Native Snapshot PVs:  falseTTL:  720h0m0sCSISnapshotTimeout:  10m0sHooks:  <none>Backup Format Version:  1.1.0Started:    2023-01-18 10:05:31 +0000 UTC
      Completed:  2023-01-18 14:48:44 +0000 UTCExpiration:  2023-02-17 10:05:31 +0000 UTCTotal items to be backed up:  40022
      Items backed up:              40022Velero-Native Snapshots: <none included>restic Backups (specify --details for more information):
        Completed:  4557 

       

      the full logs can be found here under the name :

      1. adp-1152_velero.log     created using > velero backup logs adp-1152
      2. velero-67b85f9b48-g64st.log  created using >  oc logs velero-67b85f9b48-g64st -nopenshift-adp

       

       

       

            tkaovila@redhat.com Tiger Kaovilai
            tzahia Tzahi Ashkenazi
            Tzahi Ashkenazi Tzahi Ashkenazi
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: