Loading...

XML

Word

Printable

Type: Sub-task
Resolution: Done
Priority: Undefined
Fix Version/s: OADP 1.5.1
Affects Version/s: None
Component/s: None
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
QEStatus:
ToDo
Intelligence Requested:
Market:

Risk Probability:
Very Likely
Risk Score:
0

Workstream:

None

Root Cause:
Unset
Failure Category:
Unknown

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

On configuring the repository maintenance pod to run on a specific node using labels in DPA config, it does not respect the configurations and only run on the node where the application pod is running.

Also the resource requests and limits are being ignored.
Version-Release number of selected component (if applicable):

1.5.0

How reproducible:

Always

Steps to Reproduce:
1. Add appropriate labels to nodes.
2. Configure affinity and resource settings in the DPA config using spec.configuration.repositoryMaintenance.global

Actual results:

Maintenance job runs on wrong node.

oc get dpa -o yaml
apiVersion: v1
items:
- apiVersion: oadp.openshift.io/v1alpha1
  kind: DataProtectionApplication
  metadata:
    creationTimestamp: "2025-05-21T08:16:15Z"
    generation: 4
    name: ts-dpa
    namespace: openshift-adp
    resourceVersion: "71587"
    uid: 99ae2b16-d8c6-48ad-bc63-2e5ed89fbb16
  spec:
    backupLocations:
    - velero:
        config:
          region: us-east-2
        credential:
          key: cloud
          name: cloud-credentials
        default: true
        objectStorage:
          bucket: oadp119731dh2k8
          prefix: velero
        provider: aws
    configuration:
      nodeAgent:
        enable: true
        uploaderType: kopia
      repositoryMaintenance:
        global:
          loadAffinity:
          - nodeSelector:
              matchExpressions:
              - key: label.io/location
                operator: In
                values:
                - EU
              matchLabels:
                label.io/gpu: "no"
          podResources:
            cpuLimit: 200m
            cpuRequest: 100m
            memoryLimit: 200Mi
            memoryRequest: 100Mi
      velero:
        defaultPlugins:
        - csi
        - aws
        - openshift
        disableFsBackup: false
    logFormat: text
  status:
    conditions:
    - lastTransitionTime: "2025-05-21T08:16:15Z"
      message: Reconcile complete
      reason: Complete
      status: "True"
      type: Reconciled
kind: List
metadata:
  resourceVersion: ""

This is the only node satisfying the 2 labels.

oc get nodes --show-labels | grep gpu | grep EU

ip-10-0-91-209.us-east-2.compute.internal   Ready    worker                 3h58m   v1.32.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m6i.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2c,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-91-209.us-east-2.compute.internal,kubernetes.io/os=linux,label.io/gpu=no,label.io/location=EU,machine.openshift.io/interruptible-instance=,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m6i.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2c,topology.k8s.aws/zone-id=use2-az3,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2c

oc get pods -o wide
NAME                                                    READY   STATUS      RESTARTS   AGE    IP            NODE                                        NOMINATED NODE   READINESS GATES
mysql-ts-dpa-1-kopia-maintain-job-1747821617594-vcttx   0/1     Completed   0          14m    10.131.0.50   ip-10-0-14-96.us-east-2.compute.internal    <none>           <none>
node-agent-hrb6b                                        1/1     Running     0          118m   10.128.2.19   ip-10-0-91-209.us-east-2.compute.internal   <none>           <none>
node-agent-nxccd                                        1/1     Running     0          118m   10.129.2.16   ip-10-0-60-216.us-east-2.compute.internal   <none>           <none>
node-agent-p6gp7                                        1/1     Running     0          118m   10.131.0.41   ip-10-0-14-96.us-east-2.compute.internal    <none>           <none>
openshift-adp-controller-manager-788c8c458b-bj9s5       1/1     Running     0          140m   10.128.2.18   ip-10-0-91-209.us-east-2.compute.internal   <none>           <none>
velero-56b949b7b4-qpwjg                                 1/1     Running     0          109m   10.129.2.17   ip-10-0-60-216.us-east-2.compute.internal   <none>           <none>

Backuprepository

oc get backuprepository -o yaml
apiVersion: v1
items:
- apiVersion: velero.io/v1
  kind: BackupRepository
  metadata:
    creationTimestamp: "2025-05-21T08:59:16Z"
    generation: 4
    labels:
      velero.io/repository-type: kopia
      velero.io/storage-location: ts-dpa-1
      velero.io/volume-namespace: mysql
    name: mysql-ts-dpa-1-kopia
    namespace: openshift-adp
    resourceVersion: "87330"
    uid: 2fe358bc-717a-4ec2-a831-2f09e2f21e0d
  spec:
    backupStorageLocation: ts-dpa-1
    maintenanceFrequency: 1h0m0s
    repositoryType: kopia
    resticIdentifier: s3:s3-us-east-2.amazonaws.com/oadp119731dh2k8/velero/restic/mysql
    volumeNamespace: mysql
  status:
    lastMaintenanceTime: "2025-05-21T10:00:22Z"
    phase: Ready
    recentMaintenance:
    - completeTimestamp: "2025-05-21T10:00:22Z"
      result: Succeeded
      startTimestamp: "2025-05-21T10:00:17Z"
kind: List
metadata:
  resourceVersion: ""

ConfigMap:

oc get cm repository-maintenance-ts-dpa -o yaml
apiVersion: v1
data:
  repository-maintenance-config: '{"global":{"loadAffinity":[{"nodeSelector":{"matchLabels":{"label.io/gpu":"no"},"matchExpressions":[{"key":"label.io/location","operator":"In","values":["EU"]}]}}],"podResources":{"cpuRequest":"100m","memoryRequest":"100Mi","cpuLimit":"200m","memoryLimit":"200Mi"}}}'
kind: ConfigMap
metadata:
  creationTimestamp: "2025-05-21T08:25:08Z"
  labels:
    app.kubernetes.io/component: repository-maintenance-config
    app.kubernetes.io/instance: ts-dpa
    app.kubernetes.io/managed-by: oadp-operator
    openshift.io/oadp: "True"
  name: repository-maintenance-ts-dpa
  namespace: openshift-adp
  ownerReferences:
  - apiVersion: oadp.openshift.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: DataProtectionApplication
    name: ts-dpa
    uid: 99ae2b16-d8c6-48ad-bc63-2e5ed89fbb16
  resourceVersion: "63042"
  uid: c7e4ad9e-f793-40d8-b163-3b17484ae370

velero args:

spec:
  containers:
  - args:
    - server
    - --features=EnableCSI
    - --uploader-type=kopia
    - --fs-backup-timeout=4h
    - --restore-resource-priorities=securitycontextconstraints,customresourcedefinitions,klusterletconfigs.config.open-cluster-management.io,managedcluster.cluster.open-cluster-management.io,namespaces,roles,rolebindings,clusterrolebindings,klusterletaddonconfig.agent.open-cluster-management.io,managedclusteraddon.addon.open-cluster-management.io,storageclasses,volumesnapshotclass.snapshot.storage.k8s.io,volumesnapshotcontents.snapshot.storage.k8s.io,volumesnapshots.snapshot.storage.k8s.io,datauploads.velero.io,persistentvolumes,persistentvolumeclaims,serviceaccounts,secrets,configmaps,limitranges,pods,replicasets.apps,clusterclasses.cluster.x-k8s.io,endpoints,services,-,clusterbootstraps.run.tanzu.vmware.com,clusters.cluster.x-k8s.io,clusterresourcesets.addons.cluster.x-k8s.io
    - --log-format=text
    - --disable-informer-cache=false
    - --repo-maintenance-job-configmap=repository-maintenance-ts-dpa

Expected results:

Repository maintenance pod should run on correct node.

Also, the resources requests and limits config should be respected.

Assignee:: Aniruddha Nayek (Inactive)

Reporter:: Aziza Karol

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/09/01 6:13 AM

Updated:: 2025/09/10 8:19 AM

Resolved:: 2025/09/10 8:19 AM

Details

Description

Description of problem:

On configuring the repository maintenance pod to run on a specific node using labels in DPA config, it does not respect the configurations and only run on the node where the application pod is running.

1.5.0

Always

Actual results:

Maintenance job runs on wrong node.

Repository maintenance pod should run on correct node.

Attachments

Easy Agile Planning Poker

Activity

People

Dates