-
Sub-task
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
-
ToDo
-
-
-
0
-
0
-
Very Likely
-
0
-
None
-
Unset
-
Unknown
Description of problem:
Restore of job resource is partially failing in OCP 4.14. This issues has not been seen in older OCP versions. An additional label is being added to the job resource which was not present in older OCP versions. Attached an example below.
OCP 4.14
template:
metadata:
creationTimestamp: null
labels:
app: external-job
batch.kubernetes.io/controller-uid: 6f6b5743-e39a-4057-b74c-692f1b8a3ab8
batch.kubernetes.io/job-name: external-job
controller-uid: 6f6b5743-e39a-4057-b74c-692f1b8a3ab8
job-name: external-job
OCP 4.13
template:
metadata:
creationTimestamp: null
labels:
app: external-job
controller-uid: 370373ce-73ca-44ba-878d-043621b3f50f
job-name: external-job
Version-Release number of selected component (if applicable):
OADP 1.1.6 -8
OCP 4.14
How reproducible:
Always
Steps to Reproduce:
1. Create a DPA
2. Deploy a Job resource
apiVersion: batch/v1 kind: Job metadata: labels: app: external-job name: external-job namespace: ocp-jobs spec: template: metadata: labels: app: external-job spec: containers: - command: - sh - -c - |- echo 'Original source image: quay.io/migqe/alpine:3.16 Sleeping for 1d'; sleep 1d; echo 'Done!' image: quay.io/migqe/alpine:3.16 name: external-job restartPolicy: Never
3. Execute backup
$ oc get backup test-backup2 -o yaml apiVersion: velero.io/v1 kind: Backup metadata: annotations: velero.io/source-cluster-k8s-gitversion: v1.27.4+deb2c60 velero.io/source-cluster-k8s-major-version: "1" velero.io/source-cluster-k8s-minor-version: "27" creationTimestamp: "2023-08-25T11:15:27Z" generation: 1 labels: velero.io/storage-location: ts-dpa-1 managedFields: - apiVersion: velero.io/v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:velero.io/source-cluster-k8s-gitversion: {} f:velero.io/source-cluster-k8s-major-version: {} f:velero.io/source-cluster-k8s-minor-version: {} f:labels: .: {} f:velero.io/storage-location: {} f:spec: .: {} f:csiSnapshotTimeout: {} f:defaultVolumesToRestic: {} f:hooks: {} f:includedNamespaces: {} f:metadata: {} f:storageLocation: {} f:ttl: {} f:status: .: {} f:completionTimestamp: {} f:expiration: {} f:formatVersion: {} f:phase: {} f:progress: .: {} f:itemsBackedUp: {} f:totalItems: {} f:startTimestamp: {} f:version: {} manager: velero-server operation: Update time: "2023-08-25T11:15:27Z" name: test-backup namespace: openshift-adp resourceVersion: "138427" uid: 1903dab0-b228-4684-8354-3b70c8a0bba1 spec: csiSnapshotTimeout: 10m0s defaultVolumesToRestic: false hooks: {} includedNamespaces: - ocp-jobs metadata: {} storageLocation: ts-dpa-1 ttl: 720h0m0s status: completionTimestamp: "2023-08-25T09:14:11Z" expiration: "2023-09-24T09:13:41Z" formatVersion: 1.1.0 phase: Completed progress: itemsBackedUp: 41 totalItems: 41 startTimestamp: "2023-08-25T09:13:41Z" version: 1
5. Delete app namespace
6. Execute Restore
$ oc get restore test-restore -o yaml apiVersion: velero.io/v1 kind: Restore metadata: creationTimestamp: "2023-08-25T08:41:28Z" generation: 7 managedFields: - apiVersion: velero.io/v1 fieldsType: FieldsV1 fieldsV1: f:spec: .: {} f:backupName: {} manager: kubectl-create operation: Update time: "2023-08-25T08:41:28Z" - apiVersion: velero.io/v1 fieldsType: FieldsV1 fieldsV1: f:spec: f:excludedResources: {} f:status: .: {} f:completionTimestamp: {} f:errors: {} f:phase: {} f:progress: .: {} f:itemsRestored: {} f:totalItems: {} f:startTimestamp: {} f:warnings: {} manager: velero-server operation: Update time: "2023-08-25T08:41:45Z" name: test-restore namespace: openshift-adp resourceVersion: "70994" uid: 90927865-fa4d-482c-9cf5-00907f65f8f5 spec: backupName: test-backup excludedResources: - nodes - events - events.events.k8s.io - backups.velero.io - restores.velero.io - resticrepositories.velero.io - csinodes.storage.k8s.io - volumeattachments.storage.k8s.io status: completionTimestamp: "2023-08-25T08:41:45Z" errors: 2 phase: PartiallyFailed progress: itemsRestored: 29 totalItems: 29 startTimestamp: "2023-08-25T08:41:28Z" warnings: 4
Actual results:
Restore is partially getting failed in OCP 4.14
$ velero restore logs test-restore -n openshift-adp | grep error time="2023-08-25T08:41:44Z" level=error msg="error restoring external-job: Job.batch \"external-job\" is invalid: [spec.template.metadata.labels[batch.kubernetes.io/controller-uid]: Invalid value: map[string]string{\"app\":\"external-job\", \"batch.kubernetes.io/controller-uid\":\"7b71b1c8-b299-482b-b713-4c74b82cd77f\", \"batch.kubernetes.io/job-name\":\"external-job\", \"controller-uid\":\"b858296c-80cd-4fe0-83f1-48cc20c8d80f\", \"job-name\":\"external-job\"}: must be 'b858296c-80cd-4fe0-83f1-48cc20c8d80f', spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"batch.kubernetes.io/controller-uid\":\"7b71b1c8-b299-482b-b713-4c74b82cd77f\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: `selector` not auto-generated]" logSource="/remote-source/velero/app/pkg/restore/restore.go:1388" restore=openshift-adp/test-restore time="2023-08-25T08:41:44Z" level=error msg="error restoring internal-job: Job.batch \"internal-job\" is invalid: [spec.template.metadata.labels[batch.kubernetes.io/controller-uid]: Invalid value: map[string]string{\"app\":\"internal-job\", \"batch.kubernetes.io/controller-uid\":\"3e0c1fc3-2ab4-4527-bd9f-98a071a2b0ee\", \"batch.kubernetes.io/job-name\":\"internal-job\", \"controller-uid\":\"bf7003b2-89f2-4c77-b884-da67ba936c67\", \"job-name\":\"internal-job\"}: must be 'bf7003b2-89f2-4c77-b884-da67ba936c67', spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"batch.kubernetes.io/controller-uid\":\"3e0c1fc3-2ab4-4527-bd9f-98a071a2b0ee\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: `selector` not auto-generated]" logSource="/remote-source/velero/app/pkg/restore/restore.go:1388" restore=openshift-adp/test-restore time="2023-08-25T08:41:45Z" level=error msg="Namespace ocp-jobs, resource restore error: error restoring jobs.batch/ocp-jobs/external-job: Job.batch \"external-job\" is invalid: [spec.template.metadata.labels[batch.kubernetes.io/controller-uid]: Invalid value: map[string]string{\"app\":\"external-job\", \"batch.kubernetes.io/controller-uid\":\"7b71b1c8-b299-482b-b713-4c74b82cd77f\", \"batch.kubernetes.io/job-name\":\"external-job\", \"controller-uid\":\"b858296c-80cd-4fe0-83f1-48cc20c8d80f\", \"job-name\":\"external-job\"}: must be 'b858296c-80cd-4fe0-83f1-48cc20c8d80f', spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"batch.kubernetes.io/controller-uid\":\"7b71b1c8-b299-482b-b713-4c74b82cd77f\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: `selector` not auto-generated]" logSource="/remote-source/velero/app/pkg/controller/restore_controller.go:510" restore=openshift-adp/test-restore time="2023-08-25T08:41:45Z" level=error msg="Namespace ocp-jobs, resource restore error: error restoring jobs.batch/ocp-jobs/internal-job: Job.batch \"internal-job\" is invalid: [spec.template.metadata.labels[batch.kubernetes.io/controller-uid]: Invalid value: map[string]string{\"app\":\"internal-job\", \"batch.kubernetes.io/controller-uid\":\"3e0c1fc3-2ab4-4527-bd9f-98a071a2b0ee\", \"batch.kubernetes.io/job-name\":\"internal-job\", \"controller-uid\":\"bf7003b2-89f2-4c77-b884-da67ba936c67\", \"job-name\":\"internal-job\"}: must be 'bf7003b2-89f2-4c77-b884-da67ba936c67', spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"batch.kubernetes.io/controller-uid\":\"3e0c1fc3-2ab4-4527-bd9f-98a071a2b0ee\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: `selector` not auto-generated]" logSource="/remote-source/velero/app/pkg/controller/restore_controller.go:510" restore=openshift-adp/test-restore
Expected results:
Restore should be successful.
Additional info:
OCP 4.14 result
https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/cam/job/oadp-1.1-tier1-tests/1005/
Tests are passing in OCP 4.13
https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/cam/job/oadp-1.1-tier1-tests/1006/