Loading...

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: OADP 1.5.1
Affects Version/s: OADP 1.5.0
Component/s: None
Labels:

Activity Type:
Quality / Stability / Reliability
Story Points:
3
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Fixed in Build:
oadp-operator-bundle-container-1.5.1-32
QEStatus:
ToDo
Intelligence Requested:
Market:

Risk Probability:
Very Likely
Risk Score:
0

Workstream:

None

Root Cause:
Unset
Failure Category:
Unknown

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

On configuring the repository maintenance pod to run on a specific node using labels in DPA config, it does not respect the configurations and only run on the node where the application pod is running.

Also the resource requests and limits are being ignored.
Version-Release number of selected component (if applicable):

1.5.0

How reproducible:

Always

Steps to Reproduce:
1. Add appropriate labels to nodes.
2. Configure affinity and resource settings in the DPA config using spec.configuration.repositoryMaintenance.global

Actual results:

Maintenance job runs on wrong node.

oc get dpa -o yaml
apiVersion: v1
items:
- apiVersion: oadp.openshift.io/v1alpha1
  kind: DataProtectionApplication
  metadata:
    creationTimestamp: "2025-05-21T08:16:15Z"
    generation: 4
    name: ts-dpa
    namespace: openshift-adp
    resourceVersion: "71587"
    uid: 99ae2b16-d8c6-48ad-bc63-2e5ed89fbb16
  spec:
    backupLocations:
    - velero:
        config:
          region: us-east-2
        credential:
          key: cloud
          name: cloud-credentials
        default: true
        objectStorage:
          bucket: oadp119731dh2k8
          prefix: velero
        provider: aws
    configuration:
      nodeAgent:
        enable: true
        uploaderType: kopia
      repositoryMaintenance:
        global:
          loadAffinity:
          - nodeSelector:
              matchExpressions:
              - key: label.io/location
                operator: In
                values:
                - EU
              matchLabels:
                label.io/gpu: "no"
          podResources:
            cpuLimit: 200m
            cpuRequest: 100m
            memoryLimit: 200Mi
            memoryRequest: 100Mi
      velero:
        defaultPlugins:
        - csi
        - aws
        - openshift
        disableFsBackup: false
    logFormat: text
  status:
    conditions:
    - lastTransitionTime: "2025-05-21T08:16:15Z"
      message: Reconcile complete
      reason: Complete
      status: "True"
      type: Reconciled
kind: List
metadata:
  resourceVersion: ""

This is the only node satisfying the 2 labels.

oc get nodes --show-labels | grep gpu | grep EU

ip-10-0-91-209.us-east-2.compute.internal   Ready    worker                 3h58m   v1.32.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m6i.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2c,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-91-209.us-east-2.compute.internal,kubernetes.io/os=linux,label.io/gpu=no,label.io/location=EU,machine.openshift.io/interruptible-instance=,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m6i.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2c,topology.k8s.aws/zone-id=use2-az3,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2c

oc get pods -o wide
NAME                                                    READY   STATUS      RESTARTS   AGE    IP            NODE                                        NOMINATED NODE   READINESS GATES
mysql-ts-dpa-1-kopia-maintain-job-1747821617594-vcttx   0/1     Completed   0          14m    10.131.0.50   ip-10-0-14-96.us-east-2.compute.internal    <none>           <none>
node-agent-hrb6b                                        1/1     Running     0          118m   10.128.2.19   ip-10-0-91-209.us-east-2.compute.internal   <none>           <none>
node-agent-nxccd                                        1/1     Running     0          118m   10.129.2.16   ip-10-0-60-216.us-east-2.compute.internal   <none>           <none>
node-agent-p6gp7                                        1/1     Running     0          118m   10.131.0.41   ip-10-0-14-96.us-east-2.compute.internal    <none>           <none>
openshift-adp-controller-manager-788c8c458b-bj9s5       1/1     Running     0          140m   10.128.2.18   ip-10-0-91-209.us-east-2.compute.internal   <none>           <none>
velero-56b949b7b4-qpwjg                                 1/1     Running     0          109m   10.129.2.17   ip-10-0-60-216.us-east-2.compute.internal   <none>           <none>

Backuprepository

oc get backuprepository -o yaml
apiVersion: v1
items:
- apiVersion: velero.io/v1
  kind: BackupRepository
  metadata:
    creationTimestamp: "2025-05-21T08:59:16Z"
    generation: 4
    labels:
      velero.io/repository-type: kopia
      velero.io/storage-location: ts-dpa-1
      velero.io/volume-namespace: mysql
    name: mysql-ts-dpa-1-kopia
    namespace: openshift-adp
    resourceVersion: "87330"
    uid: 2fe358bc-717a-4ec2-a831-2f09e2f21e0d
  spec:
    backupStorageLocation: ts-dpa-1
    maintenanceFrequency: 1h0m0s
    repositoryType: kopia
    resticIdentifier: s3:s3-us-east-2.amazonaws.com/oadp119731dh2k8/velero/restic/mysql
    volumeNamespace: mysql
  status:
    lastMaintenanceTime: "2025-05-21T10:00:22Z"
    phase: Ready
    recentMaintenance:
    - completeTimestamp: "2025-05-21T10:00:22Z"
      result: Succeeded
      startTimestamp: "2025-05-21T10:00:17Z"
kind: List
metadata:
  resourceVersion: ""

ConfigMap:

oc get cm repository-maintenance-ts-dpa -o yaml
apiVersion: v1
data:
  repository-maintenance-config: '{"global":{"loadAffinity":[{"nodeSelector":{"matchLabels":{"label.io/gpu":"no"},"matchExpressions":[{"key":"label.io/location","operator":"In","values":["EU"]}]}}],"podResources":{"cpuRequest":"100m","memoryRequest":"100Mi","cpuLimit":"200m","memoryLimit":"200Mi"}}}'
kind: ConfigMap
metadata:
  creationTimestamp: "2025-05-21T08:25:08Z"
  labels:
    app.kubernetes.io/component: repository-maintenance-config
    app.kubernetes.io/instance: ts-dpa
    app.kubernetes.io/managed-by: oadp-operator
    openshift.io/oadp: "True"
  name: repository-maintenance-ts-dpa
  namespace: openshift-adp
  ownerReferences:
  - apiVersion: oadp.openshift.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: DataProtectionApplication
    name: ts-dpa
    uid: 99ae2b16-d8c6-48ad-bc63-2e5ed89fbb16
  resourceVersion: "63042"
  uid: c7e4ad9e-f793-40d8-b163-3b17484ae370

velero args:

spec:
  containers:
  - args:
    - server
    - --features=EnableCSI
    - --uploader-type=kopia
    - --fs-backup-timeout=4h
    - --restore-resource-priorities=securitycontextconstraints,customresourcedefinitions,klusterletconfigs.config.open-cluster-management.io,managedcluster.cluster.open-cluster-management.io,namespaces,roles,rolebindings,clusterrolebindings,klusterletaddonconfig.agent.open-cluster-management.io,managedclusteraddon.addon.open-cluster-management.io,storageclasses,volumesnapshotclass.snapshot.storage.k8s.io,volumesnapshotcontents.snapshot.storage.k8s.io,volumesnapshots.snapshot.storage.k8s.io,datauploads.velero.io,persistentvolumes,persistentvolumeclaims,serviceaccounts,secrets,configmaps,limitranges,pods,replicasets.apps,clusterclasses.cluster.x-k8s.io,endpoints,services,-,clusterbootstraps.run.tanzu.vmware.com,clusters.cluster.x-k8s.io,clusterresourcesets.addons.cluster.x-k8s.io
    - --log-format=text
    - --disable-informer-cache=false
    - --repo-maintenance-job-configmap=repository-maintenance-ts-dpa

Expected results:

Repository maintenance pod should run on correct node.

Also, the resources requests and limits config should be respected.

links to

openshift/oadp-operator#1896: OADP-6168: Use proper ConfigMap for the repository maintenance job

openshift/oadp-operator#1917: [oadp-1.5] OADP-6168: Use proper ConfigMap for the repository maintenance job

RHBA-2025:150824 OpenShift API for Data Protection (OADP) 1.5.1 security and bug fix update

Upstream velero issue

There are no Sub-Tasks for this issue.

Details

Description

Description of problem:

On configuring the repository maintenance pod to run on a specific node using labels in DPA config, it does not respect the configurations and only run on the node where the application pod is running.

1.5.0

Always

Actual results:

Maintenance job runs on wrong node.

Repository maintenance pod should run on correct node.

Attachments

Issue Links

Easy Agile Planning Poker

Sub-Tasks

Activity

People

Dates