-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
Quality / Stability / Reliability
-
3
-
False
-
-
False
-
ToDo
-
-
-
Very Likely
-
0
-
None
-
Unset
-
Unknown
-
None
Description of problem:
https://github.com/migtools/openshift-migration-plugin/pull/2
NodeSelector part stripped from pod at the time of restore causing the applications pod to get scheduled on wrong node which leads to restore failure.
Restore fails with below error as there is no deamonSet pod running on that node.
time="2026-01-29T13:03:31Z" level=error msg="Velero restore error: node-agent pod is not running in node ip-10-0-55-175.us-east-2.compute.internal: daemonset pod not found in running state in node ip-10-0-55-175.us-east-2.compute.internal" logSource="pkg/controller/restore_controller.go:602" restore=openshift-adp/test-restore3
For more info please refer to slack discussion:-
https://redhat-internal.slack.com/archives/C039LRSDC8Z/p1769692353316809
Version-Release number of selected component (if applicable):
OADP deployed via olm deploy(used oadp-dev branch)
How reproducible:
Always
Steps to Reproduce:
1. Add a label to one of the worker node.
$ oc get node -l foo=bar NAME STATUS ROLES AGE VERSION ip-10-0-14-61.us-east-2.compute.internal Ready worker 9h v1.32.10
2. Create a dpa with nodeAffinity setting and schedule node-agent pods to this node
oc get dpa ts-dpa -o yaml apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: creationTimestamp: "2026-01-29T14:04:45Z" generation: 2 name: ts-dpa namespace: openshift-adp resourceVersion: "348252" uid: 510d87c4-3682-4975-8e07-35893c7ddf59 spec: backupLocations: - velero: config: profile: default region: us-east-2 credential: key: cloud name: cloud-credentials default: true objectStorage: bucket: oadp9716nl66 prefix: velero provider: aws configuration: nodeAgent: enable: true loadAffinity: - nodeSelector: matchExpressions: - key: foo operator: In values: - bar restorePVC: ignoreDelayBinding: true uploaderType: kopia velero: defaultPlugins: - aws - openshift - hypershift - csi disableFsBackup: false logFormat: text status: conditions: - lastTransitionTime: "2026-01-29T14:04:45Z" message: Reconcile complete reason: Complete status: "True" type: Reconciled - lastTransitionTime: "2026-01-29T14:04:50Z" message: 'Velero deployment ready: 1/1 replicas' reason: DeploymentReady status: "True" type: VeleroReady - lastTransitionTime: "2026-01-29T14:04:50Z" message: 'NodeAgent DaemonSet ready: 1/1 pods ready' reason: DaemonSetReady status: "True" type: NodeAgentReady - lastTransitionTime: "2026-01-29T14:04:45Z" message: Non-Admin controller is disabled reason: ComponentDisabled status: "True" type: NonAdminReady - lastTransitionTime: "2026-01-29T14:04:45Z" message: VM File Restore controller is disabled reason: ComponentDisabled status: "True" type: VMFileRestoreReady
3. Deploy an application pods to the same labeled node. Below command creates a deployment with NodeSelector spec.
$ ansible-playbook deploy.yml -e use_role=ocp-mysql -e cluster_version=4.19 -e oc_binary=oc -e url=https://api.oadp-971.qe.devcluster.openshift.com:6443 -e token=sha256~We4w121SEOoOWQgSZiz8TDYHnmCWuWAdjCT2IeSlJsU -e namespace=test1 -e admin_token=sha256~We4w121SEOoOWQgSZiz8TDYHnmCWuWAdjCT2IeSlJsU -e '{"node_selector": {"foo": "bar"}}'
4. Trigger FS backup
5. Remove app namesapce
6. Execute restore
Actual results:
Restore partially failed with error daemonset not found
Expected results:
Restore should be completed successfully
Additional info: