Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-2530

Restore is partially failing for job resource

    XMLWordPrintable

Details

    • False
    • Hide

      None

      Show
      None
    • False
    • oadp-operator-bundle-container-1.1.6-11
    • ToDo
    • Yes
    • 0
    • 0
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown

    Description

      Description of problem:

      Restore of job resource is partially failing in OCP 4.14. This issues has not been seen in older OCP versions. An additional label is being added to the job resource which was not present in older OCP versions. Attached an example below. 

      OCP 4.14 

       template:
          metadata:
            creationTimestamp: null
            labels:
              app: external-job
              batch.kubernetes.io/controller-uid: 6f6b5743-e39a-4057-b74c-692f1b8a3ab8
              batch.kubernetes.io/job-name: external-job
              controller-uid: 6f6b5743-e39a-4057-b74c-692f1b8a3ab8
              job-name: external-job 

       

      OCP 4.13

      template:
          metadata:
            creationTimestamp: null
            labels:
              app: external-job
              controller-uid: 370373ce-73ca-44ba-878d-043621b3f50f
              job-name: external-job 

       

      Version-Release number of selected component (if applicable):
      OADP 1.1.6 -8 
      OCP 4.14 

       

      How reproducible:
      Always 

       

      Steps to Reproduce:
      1. Create a DPA
      2. Deploy a Job resource

      apiVersion: batch/v1
      kind: Job
      metadata:
        labels:
          app: external-job
        name: external-job
        namespace: ocp-jobs
      spec:
        template:
          metadata:
            labels:
              app: external-job
          spec:
            containers:
            - command:
              - sh
              - -c
              - |-
                echo 'Original source image: quay.io/migqe/alpine:3.16 Sleeping for 1d';
                sleep 1d;
                echo 'Done!'
              image: quay.io/migqe/alpine:3.16
              name: external-job
            restartPolicy: Never
      

      3. Execute backup

      $ oc get backup test-backup2 -o yaml
      apiVersion: velero.io/v1
      kind: Backup
      metadata:
        annotations:
          velero.io/source-cluster-k8s-gitversion: v1.27.4+deb2c60
          velero.io/source-cluster-k8s-major-version: "1"
          velero.io/source-cluster-k8s-minor-version: "27"
        creationTimestamp: "2023-08-25T11:15:27Z"
        generation: 1
        labels:
          velero.io/storage-location: ts-dpa-1
        managedFields:
        - apiVersion: velero.io/v1
          fieldsType: FieldsV1
          fieldsV1:
            f:metadata:
              f:annotations:
                .: {}
                f:velero.io/source-cluster-k8s-gitversion: {}
                f:velero.io/source-cluster-k8s-major-version: {}
                f:velero.io/source-cluster-k8s-minor-version: {}
              f:labels:
                .: {}
                f:velero.io/storage-location: {}
            f:spec:
              .: {}
              f:csiSnapshotTimeout: {}
              f:defaultVolumesToRestic: {}
              f:hooks: {}
              f:includedNamespaces: {}
              f:metadata: {}
              f:storageLocation: {}
              f:ttl: {}
            f:status:
              .: {}
              f:completionTimestamp: {}
              f:expiration: {}
              f:formatVersion: {}
              f:phase: {}
              f:progress:
                .: {}
                f:itemsBackedUp: {}
                f:totalItems: {}
              f:startTimestamp: {}
              f:version: {}
          manager: velero-server
          operation: Update
          time: "2023-08-25T11:15:27Z"
        name: test-backup
        namespace: openshift-adp
        resourceVersion: "138427"
        uid: 1903dab0-b228-4684-8354-3b70c8a0bba1
      spec:
        csiSnapshotTimeout: 10m0s
        defaultVolumesToRestic: false
        hooks: {}
        includedNamespaces:
        - ocp-jobs
        metadata: {}
        storageLocation: ts-dpa-1
        ttl: 720h0m0s
      status:
        completionTimestamp: "2023-08-25T09:14:11Z"
        expiration: "2023-09-24T09:13:41Z"
        formatVersion: 1.1.0
        phase: Completed
        progress:
          itemsBackedUp: 41
          totalItems: 41
        startTimestamp: "2023-08-25T09:13:41Z"
        version: 1 

      5. Delete app namespace
      6. Execute Restore

      $ oc get restore  test-restore -o yaml
      apiVersion: velero.io/v1
      kind: Restore
      metadata:
        creationTimestamp: "2023-08-25T08:41:28Z"
        generation: 7
        managedFields:
        - apiVersion: velero.io/v1
          fieldsType: FieldsV1
          fieldsV1:
            f:spec:
              .: {}
              f:backupName: {}
          manager: kubectl-create
          operation: Update
          time: "2023-08-25T08:41:28Z"
        - apiVersion: velero.io/v1
          fieldsType: FieldsV1
          fieldsV1:
            f:spec:
              f:excludedResources: {}
            f:status:
              .: {}
              f:completionTimestamp: {}
              f:errors: {}
              f:phase: {}
              f:progress:
                .: {}
                f:itemsRestored: {}
                f:totalItems: {}
              f:startTimestamp: {}
              f:warnings: {}
          manager: velero-server
          operation: Update
          time: "2023-08-25T08:41:45Z"
        name: test-restore
        namespace: openshift-adp
        resourceVersion: "70994"
        uid: 90927865-fa4d-482c-9cf5-00907f65f8f5
      spec:
        backupName: test-backup
        excludedResources:
        - nodes
        - events
        - events.events.k8s.io
        - backups.velero.io
        - restores.velero.io
        - resticrepositories.velero.io
        - csinodes.storage.k8s.io
        - volumeattachments.storage.k8s.io
      status:
        completionTimestamp: "2023-08-25T08:41:45Z"
        errors: 2
        phase: PartiallyFailed
        progress:
          itemsRestored: 29
          totalItems: 29
        startTimestamp: "2023-08-25T08:41:28Z"
        warnings: 4
      

      Actual results:

      Restore is partially getting failed in OCP 4.14 

      $ velero restore logs test-restore -n openshift-adp | grep error
      time="2023-08-25T08:41:44Z" level=error msg="error restoring external-job: Job.batch \"external-job\" is invalid: [spec.template.metadata.labels[batch.kubernetes.io/controller-uid]: Invalid value: map[string]string{\"app\":\"external-job\", \"batch.kubernetes.io/controller-uid\":\"7b71b1c8-b299-482b-b713-4c74b82cd77f\", \"batch.kubernetes.io/job-name\":\"external-job\", \"controller-uid\":\"b858296c-80cd-4fe0-83f1-48cc20c8d80f\", \"job-name\":\"external-job\"}: must be 'b858296c-80cd-4fe0-83f1-48cc20c8d80f', spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"batch.kubernetes.io/controller-uid\":\"7b71b1c8-b299-482b-b713-4c74b82cd77f\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: `selector` not auto-generated]" logSource="/remote-source/velero/app/pkg/restore/restore.go:1388" restore=openshift-adp/test-restore
      time="2023-08-25T08:41:44Z" level=error msg="error restoring internal-job: Job.batch \"internal-job\" is invalid: [spec.template.metadata.labels[batch.kubernetes.io/controller-uid]: Invalid value: map[string]string{\"app\":\"internal-job\", \"batch.kubernetes.io/controller-uid\":\"3e0c1fc3-2ab4-4527-bd9f-98a071a2b0ee\", \"batch.kubernetes.io/job-name\":\"internal-job\", \"controller-uid\":\"bf7003b2-89f2-4c77-b884-da67ba936c67\", \"job-name\":\"internal-job\"}: must be 'bf7003b2-89f2-4c77-b884-da67ba936c67', spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"batch.kubernetes.io/controller-uid\":\"3e0c1fc3-2ab4-4527-bd9f-98a071a2b0ee\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: `selector` not auto-generated]" logSource="/remote-source/velero/app/pkg/restore/restore.go:1388" restore=openshift-adp/test-restore
      time="2023-08-25T08:41:45Z" level=error msg="Namespace ocp-jobs, resource restore error: error restoring jobs.batch/ocp-jobs/external-job: Job.batch \"external-job\" is invalid: [spec.template.metadata.labels[batch.kubernetes.io/controller-uid]: Invalid value: map[string]string{\"app\":\"external-job\", \"batch.kubernetes.io/controller-uid\":\"7b71b1c8-b299-482b-b713-4c74b82cd77f\", \"batch.kubernetes.io/job-name\":\"external-job\", \"controller-uid\":\"b858296c-80cd-4fe0-83f1-48cc20c8d80f\", \"job-name\":\"external-job\"}: must be 'b858296c-80cd-4fe0-83f1-48cc20c8d80f', spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"batch.kubernetes.io/controller-uid\":\"7b71b1c8-b299-482b-b713-4c74b82cd77f\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: `selector` not auto-generated]" logSource="/remote-source/velero/app/pkg/controller/restore_controller.go:510" restore=openshift-adp/test-restore
      time="2023-08-25T08:41:45Z" level=error msg="Namespace ocp-jobs, resource restore error: error restoring jobs.batch/ocp-jobs/internal-job: Job.batch \"internal-job\" is invalid: [spec.template.metadata.labels[batch.kubernetes.io/controller-uid]: Invalid value: map[string]string{\"app\":\"internal-job\", \"batch.kubernetes.io/controller-uid\":\"3e0c1fc3-2ab4-4527-bd9f-98a071a2b0ee\", \"batch.kubernetes.io/job-name\":\"internal-job\", \"controller-uid\":\"bf7003b2-89f2-4c77-b884-da67ba936c67\", \"job-name\":\"internal-job\"}: must be 'bf7003b2-89f2-4c77-b884-da67ba936c67', spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"batch.kubernetes.io/controller-uid\":\"3e0c1fc3-2ab4-4527-bd9f-98a071a2b0ee\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: `selector` not auto-generated]" logSource="/remote-source/velero/app/pkg/controller/restore_controller.go:510" restore=openshift-adp/test-restore 

      Expected results:

      Restore should be successful.

      Additional info:

      OCP 4.14 result
      https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/cam/job/oadp-1.1-tier1-tests/1005/

      Tests are passing in OCP 4.13
      https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/cam/job/oadp-1.1-tier1-tests/1006/

      Attachments

        Activity

          People

            tkaovila@redhat.com Tiger Kaovilai
            rhn-support-prajoshi Prasad Joshi
            Prasad Joshi Prasad Joshi
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: