Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-7370

[OpenShift Plugin] NodeSelector stripped during pod restore causing restores to fail

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • openshift-plugin
    • None
    • Quality / Stability / Reliability
    • 3
    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • None

      Description of problem:

      https://github.com/migtools/openshift-migration-plugin/pull/2

      NodeSelector part stripped from pod at the time of restore causing the applications pod to get scheduled on wrong node which leads to restore failure.

       

      Restore fails with below error as there is no deamonSet pod running on that node. 

      time="2026-01-29T13:03:31Z" level=error msg="Velero restore error: node-agent pod is not running in node ip-10-0-55-175.us-east-2.compute.internal: daemonset pod not found in running state in node ip-10-0-55-175.us-east-2.compute.internal" logSource="pkg/controller/restore_controller.go:602" restore=openshift-adp/test-restore3

      For more info please refer to slack discussion:- 

      https://redhat-internal.slack.com/archives/C039LRSDC8Z/p1769692353316809

       

      Version-Release number of selected component (if applicable):

      OADP deployed via olm deploy(used oadp-dev branch)

       

      How reproducible:
      Always

       

      Steps to Reproduce:
      1.  Add a label to one of the worker node. 

      $ oc get node -l foo=bar 
      NAME                                       STATUS   ROLES    AGE   VERSION
      ip-10-0-14-61.us-east-2.compute.internal   Ready    worker   9h    v1.32.10

      2. Create a dpa with nodeAffinity setting and schedule node-agent pods to this node

      oc get dpa ts-dpa -o yaml
      apiVersion: oadp.openshift.io/v1alpha1
      kind: DataProtectionApplication
      metadata:
        creationTimestamp: "2026-01-29T14:04:45Z"
        generation: 2
        name: ts-dpa
        namespace: openshift-adp
        resourceVersion: "348252"
        uid: 510d87c4-3682-4975-8e07-35893c7ddf59
      spec:
        backupLocations:
        - velero:
            config:
              profile: default
              region: us-east-2
            credential:
              key: cloud
              name: cloud-credentials
            default: true
            objectStorage:
              bucket: oadp9716nl66
              prefix: velero
            provider: aws
        configuration:
          nodeAgent:
            enable: true
            loadAffinity:
            - nodeSelector:
                matchExpressions:
                - key: foo
                  operator: In
                  values:
                  - bar
            restorePVC:
              ignoreDelayBinding: true
            uploaderType: kopia
          velero:
            defaultPlugins:
            - aws
            - openshift
            - hypershift
            - csi
            disableFsBackup: false
        logFormat: text
      status:
        conditions:
        - lastTransitionTime: "2026-01-29T14:04:45Z"
          message: Reconcile complete
          reason: Complete
          status: "True"
          type: Reconciled
        - lastTransitionTime: "2026-01-29T14:04:50Z"
          message: 'Velero deployment ready: 1/1 replicas'
          reason: DeploymentReady
          status: "True"
          type: VeleroReady
        - lastTransitionTime: "2026-01-29T14:04:50Z"
          message: 'NodeAgent DaemonSet ready: 1/1 pods ready'
          reason: DaemonSetReady
          status: "True"
          type: NodeAgentReady
        - lastTransitionTime: "2026-01-29T14:04:45Z"
          message: Non-Admin controller is disabled
          reason: ComponentDisabled
          status: "True"
          type: NonAdminReady
        - lastTransitionTime: "2026-01-29T14:04:45Z"
          message: VM File Restore controller is disabled
          reason: ComponentDisabled
          status: "True"
          type: VMFileRestoreReady 

      3. Deploy an application pods to the same labeled node. Below command creates a deployment with NodeSelector spec.

      $ ansible-playbook deploy.yml -e use_role=ocp-mysql -e cluster_version=4.19 -e oc_binary=oc -e url=https://api.oadp-971.qe.devcluster.openshift.com:6443 -e token=sha256~We4w121SEOoOWQgSZiz8TDYHnmCWuWAdjCT2IeSlJsU -e namespace=test1 -e admin_token=sha256~We4w121SEOoOWQgSZiz8TDYHnmCWuWAdjCT2IeSlJsU -e '{"node_selector": {"foo": "bar"}}' 

      4. Trigger FS backup
      5. Remove app namesapce 
      6. Execute restore

       

      Actual results:

      Restore partially failed with error daemonset not found 

       

      Expected results:

       

      Restore should be completed successfully 

      Additional info:

              wnstb Wes Hayutin
              rhn-support-prajoshi Prasad Joshi
              Prasad Joshi Prasad Joshi
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: