Description of problem:
Noticed an issue while testing nodeAgent loadAffinity setting. Restore is partially failing with error pod is not found in specified node. Attached error below:-
Velero: node-agent pod is not running in node oadp-137771-bpr9v-worker-b-jhs7p: daemonset pod not found in running state in node oadp-137771-bpr9v-worker-b-jhs7p
Version-Release number of selected component (if applicable):
Deployed oadp via make deploy command.
How reproducible:
Always
Steps to Reproduce:
1. Added a label to one of the worker node
$ oc get nodes -l foo=bar NAME STATUS ROLES AGE VERSION oadp-137771-bpr9v-worker-a-sgxkh Ready worker 5h57m v1.33.5
2. Created DPA with nodeAgent loadAffinity spec
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: creationTimestamp: "2025-11-24T11:32:25Z" generation: 1 name: ts-dpa namespace: openshift-adp resourceVersion: "154981" uid: c3d5573f-f237-4eaa-b0d4-c127ff06bafa spec: backupLocations: - velero: credential: key: cloud name: cloud-credentials-gcp default: true objectStorage: bucket: oadp137771bpr9v prefix: velero-e2e-3ea14aaa-c929-11f0-a5be-5ea249a46217 provider: gcp configuration: nodeAgent: enable: true loadAffinity: - nodeSelector: matchExpressions: - key: foo operator: In values: - bar restorePVC: ignoreDelayBinding: true uploaderType: kopia velero: defaultPlugins: - openshift - gcp - kubevirt - hypershift disableFsBackup: false logFormat: text podDnsConfig: {} snapshotLocations: [] status: conditions: - lastTransitionTime: "2025-11-24T11:32:25Z" message: Reconcile complete reason: Complete status: "True" type: Reconciled
3. Deploy an application on same node
$ oc get pod -n test-oadp-683 -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES mysql-67fd7fdff6-9vnbd 1/1 Running 0 7m20s 10.128.2.74 oadp-137771-bpr9v-worker-a-sgxkh <none> <none>
4. Created FSB backup
apiVersion: velero.io/v1 kind: Backup metadata: creationTimestamp: "2025-11-24T11:46:40Z" generation: 6 labels: velero.io/storage-location: ts-dpa-1 name: mysql-3f8614d2-c929-11f0-a5be-5ea249a46217 namespace: openshift-adp resourceVersion: "159333" uid: 2ddc263d-a492-485c-92a8-c5a75be7ec1c spec: csiSnapshotTimeout: 10m0s defaultVolumesToFsBackup: true excludedClusterScopedResources: - volumesnapshotcontents.snapshot.storage.k8s.io excludedNamespaceScopedResources: - volumesnapshots.snapshot.storage.k8s.io hooks: {} includedNamespaces: - test-oadp-683 itemOperationTimeout: 4h0m0s metadata: {} snapshotMoveData: false storageLocation: ts-dpa-1 ttl: 720h0m0s volumeGroupSnapshotLabelKey: velero.io/volume-group status: completionTimestamp: "2025-11-24T11:47:04Z" expiration: "2025-12-24T11:46:40Z" formatVersion: 1.1.0 hookStatus: {} phase: Completed progress: itemsBackedUp: 43 totalItems: 43 startTimestamp: "2025-11-24T11:46:40Z" version: 1
5. Removed app namespace
6. Triggered restore
Actual results:
Restore partially failed with error node-agent pod not found
$ velero describe restore test-restore -n openshift-adp --details Name: test-restore Namespace: openshift-adp Labels: <none> Annotations: <none> Phase: PartiallyFailed (run 'velero restore logs test-restore' for more information) Total items to be restored: 26 Items restored: 26 Started: 2025-11-24 17:31:45 +0530 IST Completed: 2025-11-24 17:31:52 +0530 IST Warnings: Velero: <none> Cluster: <none> Namespaces: test-oadp-683: could not restore, RoleBinding:system:image-pullers already exists. Warning: the in-cluster version is different than the backed-up version could not restore, ConfigMap:kube-root-ca.crt already exists. Warning: the in-cluster version is different than the backed-up version could not restore, ConfigMap:openshift-service-ca.crt already exists. Warning: the in-cluster version is different than the backed-up version Errors: Velero: node-agent pod is not running in node oadp-137771-bpr9v-worker-b-jhs7p: daemonset pod not found in running state in node oadp-137771-bpr9v-worker-b-jhs7p Cluster: <none> Namespaces: <none> Backup: mysql-3f8614d2-c929-11f0-a5be-5ea249a46217
Expected results:
Restore should be completed successfully
Additional info:
time="2025-11-24T13:37:27Z" level=error msg="Velero restore error: node-agent pod is not running in node oadp-137771-bpr9v-worker-b-jhs7p: daemonset pod not found in running state in node oadp-137771-bpr9v-worker-b-jhs7p" logSource="pkg/controller/restore_controller.go:602" restore=openshift-adp/test-restore1
$ oc get cm node-agent-ts-dpa -o yaml apiVersion: v1 data: node-agent-config: '{"loadAffinity":[{"nodeSelector":{"matchExpressions":[{"key":"foo","operator":"In","values":["bar"]}]}}],"restorePVC":{"ignoreDelayBinding":true},"privilegedFsBackup":true}' kind: ConfigMap metadata: creationTimestamp: "2025-11-24T11:32:25Z" labels: app.kubernetes.io/component: node-agent-config app.kubernetes.io/instance: ts-dpa app.kubernetes.io/managed-by: oadp-operator openshift.io/oadp: "True" name: node-agent-ts-dpa namespace: openshift-adp ownerReferences: - apiVersion: oadp.openshift.io/v1alpha1 blockOwnerDeletion: true controller: true kind: DataProtectionApplication name: ts-dpa uid: c3d5573f-f237-4eaa-b0d4-c127ff06bafa resourceVersion: "154955" uid: 2e422563-a974-410c-9884-1d2df019e229