Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-6168

Repository maintenance pod not respecting the affinity and resource configs in the DPA

XMLWordPrintable

    • Quality / Stability / Reliability
    • 3
    • False
    • Hide

      None

      Show
      None
    • False
    • oadp-operator-bundle-container-1.5.1-32
    • ToDo
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • None

      Description of problem:

      On configuring the repository maintenance pod to run on a specific node using labels in DPA config, it does not respect the configurations and only run on the node where the application pod is running.

      Also the resource requests and limits are being ignored.
      Version-Release number of selected component (if applicable):

      1.5.0

      How reproducible:

      Always

      Steps to Reproduce:
      1. Add appropriate labels to nodes.
      2. Configure affinity and resource settings in the DPA config using spec.configuration.repositoryMaintenance.global

      Actual results:

      Maintenance job runs on wrong node.

      oc get dpa -o yaml
      apiVersion: v1
      items:
      - apiVersion: oadp.openshift.io/v1alpha1
        kind: DataProtectionApplication
        metadata:
          creationTimestamp: "2025-05-21T08:16:15Z"
          generation: 4
          name: ts-dpa
          namespace: openshift-adp
          resourceVersion: "71587"
          uid: 99ae2b16-d8c6-48ad-bc63-2e5ed89fbb16
        spec:
          backupLocations:
          - velero:
              config:
                region: us-east-2
              credential:
                key: cloud
                name: cloud-credentials
              default: true
              objectStorage:
                bucket: oadp119731dh2k8
                prefix: velero
              provider: aws
          configuration:
            nodeAgent:
              enable: true
              uploaderType: kopia
            repositoryMaintenance:
              global:
                loadAffinity:
                - nodeSelector:
                    matchExpressions:
                    - key: label.io/location
                      operator: In
                      values:
                      - EU
                    matchLabels:
                      label.io/gpu: "no"
                podResources:
                  cpuLimit: 200m
                  cpuRequest: 100m
                  memoryLimit: 200Mi
                  memoryRequest: 100Mi
            velero:
              defaultPlugins:
              - csi
              - aws
              - openshift
              disableFsBackup: false
          logFormat: text
        status:
          conditions:
          - lastTransitionTime: "2025-05-21T08:16:15Z"
            message: Reconcile complete
            reason: Complete
            status: "True"
            type: Reconciled
      kind: List
      metadata:
        resourceVersion: ""
      

      This is the only node satisfying the 2 labels.

      oc get nodes --show-labels | grep gpu | grep EU
      
      ip-10-0-91-209.us-east-2.compute.internal   Ready    worker                 3h58m   v1.32.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m6i.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2c,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-91-209.us-east-2.compute.internal,kubernetes.io/os=linux,label.io/gpu=no,label.io/location=EU,machine.openshift.io/interruptible-instance=,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m6i.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2c,topology.k8s.aws/zone-id=use2-az3,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2c
      
      oc get pods -o wide
      NAME                                                    READY   STATUS      RESTARTS   AGE    IP            NODE                                        NOMINATED NODE   READINESS GATES
      mysql-ts-dpa-1-kopia-maintain-job-1747821617594-vcttx   0/1     Completed   0          14m    10.131.0.50   ip-10-0-14-96.us-east-2.compute.internal    <none>           <none>
      node-agent-hrb6b                                        1/1     Running     0          118m   10.128.2.19   ip-10-0-91-209.us-east-2.compute.internal   <none>           <none>
      node-agent-nxccd                                        1/1     Running     0          118m   10.129.2.16   ip-10-0-60-216.us-east-2.compute.internal   <none>           <none>
      node-agent-p6gp7                                        1/1     Running     0          118m   10.131.0.41   ip-10-0-14-96.us-east-2.compute.internal    <none>           <none>
      openshift-adp-controller-manager-788c8c458b-bj9s5       1/1     Running     0          140m   10.128.2.18   ip-10-0-91-209.us-east-2.compute.internal   <none>           <none>
      velero-56b949b7b4-qpwjg                                 1/1     Running     0          109m   10.129.2.17   ip-10-0-60-216.us-east-2.compute.internal   <none>           <none>
      

      Backuprepository

      oc get backuprepository -o yaml
      apiVersion: v1
      items:
      - apiVersion: velero.io/v1
        kind: BackupRepository
        metadata:
          creationTimestamp: "2025-05-21T08:59:16Z"
          generation: 4
          labels:
            velero.io/repository-type: kopia
            velero.io/storage-location: ts-dpa-1
            velero.io/volume-namespace: mysql
          name: mysql-ts-dpa-1-kopia
          namespace: openshift-adp
          resourceVersion: "87330"
          uid: 2fe358bc-717a-4ec2-a831-2f09e2f21e0d
        spec:
          backupStorageLocation: ts-dpa-1
          maintenanceFrequency: 1h0m0s
          repositoryType: kopia
          resticIdentifier: s3:s3-us-east-2.amazonaws.com/oadp119731dh2k8/velero/restic/mysql
          volumeNamespace: mysql
        status:
          lastMaintenanceTime: "2025-05-21T10:00:22Z"
          phase: Ready
          recentMaintenance:
          - completeTimestamp: "2025-05-21T10:00:22Z"
            result: Succeeded
            startTimestamp: "2025-05-21T10:00:17Z"
      kind: List
      metadata:
        resourceVersion: ""
      

      ConfigMap:

      oc get cm repository-maintenance-ts-dpa -o yaml
      apiVersion: v1
      data:
        repository-maintenance-config: '{"global":{"loadAffinity":[{"nodeSelector":{"matchLabels":{"label.io/gpu":"no"},"matchExpressions":[{"key":"label.io/location","operator":"In","values":["EU"]}]}}],"podResources":{"cpuRequest":"100m","memoryRequest":"100Mi","cpuLimit":"200m","memoryLimit":"200Mi"}}}'
      kind: ConfigMap
      metadata:
        creationTimestamp: "2025-05-21T08:25:08Z"
        labels:
          app.kubernetes.io/component: repository-maintenance-config
          app.kubernetes.io/instance: ts-dpa
          app.kubernetes.io/managed-by: oadp-operator
          openshift.io/oadp: "True"
        name: repository-maintenance-ts-dpa
        namespace: openshift-adp
        ownerReferences:
        - apiVersion: oadp.openshift.io/v1alpha1
          blockOwnerDeletion: true
          controller: true
          kind: DataProtectionApplication
          name: ts-dpa
          uid: 99ae2b16-d8c6-48ad-bc63-2e5ed89fbb16
        resourceVersion: "63042"
        uid: c7e4ad9e-f793-40d8-b163-3b17484ae370
      

      velero args:

      spec:
        containers:
        - args:
          - server
          - --features=EnableCSI
          - --uploader-type=kopia
          - --fs-backup-timeout=4h
          - --restore-resource-priorities=securitycontextconstraints,customresourcedefinitions,klusterletconfigs.config.open-cluster-management.io,managedcluster.cluster.open-cluster-management.io,namespaces,roles,rolebindings,clusterrolebindings,klusterletaddonconfig.agent.open-cluster-management.io,managedclusteraddon.addon.open-cluster-management.io,storageclasses,volumesnapshotclass.snapshot.storage.k8s.io,volumesnapshotcontents.snapshot.storage.k8s.io,volumesnapshots.snapshot.storage.k8s.io,datauploads.velero.io,persistentvolumes,persistentvolumeclaims,serviceaccounts,secrets,configmaps,limitranges,pods,replicasets.apps,clusterclasses.cluster.x-k8s.io,endpoints,services,-,clusterbootstraps.run.tanzu.vmware.com,clusters.cluster.x-k8s.io,clusterresourcesets.addons.cluster.x-k8s.io
          - --log-format=text
          - --disable-informer-cache=false
          - --repo-maintenance-job-configmap=repository-maintenance-ts-dpa
      

      Expected results:

      Repository maintenance pod should run on correct node.

      Also, the resources requests and limits config should be respected.

              rhn-engineering-mpryc Michal Pryc
              rhn-support-ssingla Sachin Singla
              Sachin Singla Sachin Singla
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: