Loading...

XML

Word

Printable

Type: Bug
Resolution: Cannot Reproduce
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: kopia, restore
Labels:
None

Activity Type:
Quality / Stability / Reliability
Story Points:
3
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
QEStatus:
ToDo
Intelligence Requested:
Market:

Risk Probability:
Very Likely
Risk Score:
0

Workstream:

None

Root Cause:
Unset
Failure Category:
Unknown

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:

Description of problem:

Noticed an issue while testing nodeAgent loadAffinity setting. Restore is partially failing with error pod is not found in specified node. Attached error below:-

  Velero:   node-agent pod is not running in node oadp-137771-bpr9v-worker-b-jhs7p: daemonset pod not found in running state in node oadp-137771-bpr9v-worker-b-jhs7p

Version-Release number of selected component (if applicable):

Deployed oadp via make deploy command.

How reproducible:
Always

Steps to Reproduce:
1. Added a label to one of the worker node

$ oc get nodes -l foo=bar
NAME                               STATUS   ROLES    AGE     VERSION
oadp-137771-bpr9v-worker-a-sgxkh   Ready    worker   5h57m   v1.33.5

2. Created DPA with nodeAgent loadAffinity spec

apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
  creationTimestamp: "2025-11-24T11:32:25Z"
  generation: 1
  name: ts-dpa
  namespace: openshift-adp
  resourceVersion: "154981"
  uid: c3d5573f-f237-4eaa-b0d4-c127ff06bafa
spec:
  backupLocations:
  - velero:
      credential:
        key: cloud
        name: cloud-credentials-gcp
      default: true
      objectStorage:
        bucket: oadp137771bpr9v
        prefix: velero-e2e-3ea14aaa-c929-11f0-a5be-5ea249a46217
      provider: gcp
  configuration:
    nodeAgent:
      enable: true
      loadAffinity:
      - nodeSelector:
          matchExpressions:
          - key: foo
            operator: In
            values:
            - bar
      restorePVC:
        ignoreDelayBinding: true
      uploaderType: kopia
    velero:
      defaultPlugins:
      - openshift
      - gcp
      - kubevirt
      - hypershift
      disableFsBackup: false
  logFormat: text
  podDnsConfig: {}
  snapshotLocations: []
status:
  conditions:
  - lastTransitionTime: "2025-11-24T11:32:25Z"
    message: Reconcile complete
    reason: Complete
    status: "True"
    type: Reconciled

3. Deploy an application on same node

$ oc get pod -n test-oadp-683 -o wide
NAME                     READY   STATUS    RESTARTS   AGE     IP            NODE                               NOMINATED NODE   READINESS GATES
mysql-67fd7fdff6-9vnbd   1/1     Running   0          7m20s   10.128.2.74   oadp-137771-bpr9v-worker-a-sgxkh   <none>           <none>

4. Created FSB backup

 apiVersion: velero.io/v1
  kind: Backup
  metadata:
    creationTimestamp: "2025-11-24T11:46:40Z"
    generation: 6
    labels:
      velero.io/storage-location: ts-dpa-1
    name: mysql-3f8614d2-c929-11f0-a5be-5ea249a46217
    namespace: openshift-adp
    resourceVersion: "159333"
    uid: 2ddc263d-a492-485c-92a8-c5a75be7ec1c
  spec:
    csiSnapshotTimeout: 10m0s
    defaultVolumesToFsBackup: true
    excludedClusterScopedResources:
    - volumesnapshotcontents.snapshot.storage.k8s.io
    excludedNamespaceScopedResources:
    - volumesnapshots.snapshot.storage.k8s.io
    hooks: {}
    includedNamespaces:
    - test-oadp-683
    itemOperationTimeout: 4h0m0s
    metadata: {}
    snapshotMoveData: false
    storageLocation: ts-dpa-1
    ttl: 720h0m0s
    volumeGroupSnapshotLabelKey: velero.io/volume-group
  status:
    completionTimestamp: "2025-11-24T11:47:04Z"
    expiration: "2025-12-24T11:46:40Z"
    formatVersion: 1.1.0
    hookStatus: {}
    phase: Completed
    progress:
      itemsBackedUp: 43
      totalItems: 43
    startTimestamp: "2025-11-24T11:46:40Z"
    version: 1

5. Removed app namespace
6. Triggered restore

Actual results:

Restore partially failed with error node-agent pod not found

$ velero describe restore test-restore -n openshift-adp --details 
Name:         test-restore
Namespace:    openshift-adp
Labels:       <none>
Annotations:  <none>
Phase:                       PartiallyFailed (run 'velero restore logs test-restore' for more information)
Total items to be restored:  26
Items restored:              26
Started:    2025-11-24 17:31:45 +0530 IST
Completed:  2025-11-24 17:31:52 +0530 IST
Warnings:
  Velero:     <none>
  Cluster:    <none>
  Namespaces:
    test-oadp-683:  could not restore, RoleBinding:system:image-pullers already exists. Warning: the in-cluster version is different than the backed-up version
                    could not restore, ConfigMap:kube-root-ca.crt already exists. Warning: the in-cluster version is different than the backed-up version
                    could not restore, ConfigMap:openshift-service-ca.crt already exists. Warning: the in-cluster version is different than the backed-up version
Errors:
  Velero:   node-agent pod is not running in node oadp-137771-bpr9v-worker-b-jhs7p: daemonset pod not found in running state in node oadp-137771-bpr9v-worker-b-jhs7p
  Cluster:    <none>
  Namespaces: <none>
Backup:  mysql-3f8614d2-c929-11f0-a5be-5ea249a46217

Expected results:

Restore should be completed successfully

Additional info:

time="2025-11-24T13:37:27Z" level=error msg="Velero restore error: node-agent pod is not running in node oadp-137771-bpr9v-worker-b-jhs7p: daemonset pod not found in running state in node oadp-137771-bpr9v-worker-b-jhs7p" logSource="pkg/controller/restore_controller.go:602" restore=openshift-adp/test-restore1

$ oc get cm node-agent-ts-dpa -o yaml
apiVersion: v1
data:
  node-agent-config: '{"loadAffinity":[{"nodeSelector":{"matchExpressions":[{"key":"foo","operator":"In","values":["bar"]}]}}],"restorePVC":{"ignoreDelayBinding":true},"privilegedFsBackup":true}'
kind: ConfigMap
metadata:
  creationTimestamp: "2025-11-24T11:32:25Z"
  labels:
    app.kubernetes.io/component: node-agent-config
    app.kubernetes.io/instance: ts-dpa
    app.kubernetes.io/managed-by: oadp-operator
    openshift.io/oadp: "True"
  name: node-agent-ts-dpa
  namespace: openshift-adp
  ownerReferences:
  - apiVersion: oadp.openshift.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: DataProtectionApplication
    name: ts-dpa
    uid: c3d5573f-f237-4eaa-b0d4-c127ff06bafa
  resourceVersion: "154955"
  uid: 2e422563-a974-410c-9884-1d2df019e229

Assignee:: Wes Hayutin

Reporter:: Prasad Joshi

QA Contact:: Prasad Joshi

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/11/24 1:39 PM

Updated:: 2025/11/25 10:19 AM

Resolved:: 2025/11/25 10:19 AM

Details

Description

Description of problem:

Actual results:

Attachments

Easy Agile Planning Poker

Activity

People

Dates