-
Bug
-
Resolution: Done
-
Undefined
-
4.14
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
No
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Cluster backup consistently fails when a CGU is created with backup: true.
Version-Release number of selected component (if applicable):
TALM v4.14.0-62 OCP 4.14.0-rc.1
How reproducible:
Always
Steps to Reproduce:
1. Install hub cluster with OCP 4.14.0-rc.1 2. Install latest TALM on hub cluster 3. Provision managed cluster with OCP 4.14.0-rc.1 4. Create a CGU with backup: true 5. Enable CGU 6. CGU fails with backup status: UnrecoverableError 7. View backup agent pod logs on managedcluster
Actual results:
Backup fails
Expected results:
Backup Should succeed.
Additional info:
[kni@registry auth]$ oc logs -n openshift-talo-backup backup-agent-jnt9p --follow
INFO[0002] Successfully remounted /host/sysroot with r/w permission
INFO[0002] ------------------------------------------------------------
INFO[0002] Cleaning up old content...
INFO[0002] ------------------------------------------------------------
INFO[0002]
fullpath: /var/recovery/upgrade-recovery.sh
INFO[0002]
fullpath: /var/recovery/cluster
INFO[0002]
fullpath: /var/recovery/etc.exclude.list
INFO[0002]
fullpath: /var/recovery/etc
INFO[0002]
fullpath: /var/recovery/local
INFO[0002]
fullpath: /var/recovery/kubelet
INFO[0025]
fullpath: /var/recovery/extras.tgz
INFO[0025] Old directories deleted with contents
INFO[0025] Old contents have been cleaned up
INFO[0031] Available disk space : 456.74 GiB; Estimated disk space required for backup: 32.28 GiB
INFO[0031] Sufficient disk space found to trigger backup
INFO[0031] Upgrade recovery script written
INFO[0031] Running: bash -c /var/recovery/upgrade-recovery.sh --take-backup --dir /var/recovery
INFO[0031] ##### Thu Sep 21 14:00:48 UTC 2023: Taking backup
INFO[0031] ##### Thu Sep 21 14:00:48 UTC 2023: Wiping previous deployments and pinning active
INFO[0031] error: Out of range deployment index 1, expected < 1
INFO[0031] Deployment 0 is already pinned
INFO[0031] ##### Thu Sep 21 14:00:48 UTC 2023: Backing up container cluster and required files
INFO[0031] Certificate /etc/kubernetes/static-pod-certs/configmaps/etcd-serving-ca/ca-bundle.crt is missing. Checking in different directory
INFO[0031] Certificate /etc/kubernetes/static-pod-resources/etcd-certs/configmaps/etcd-serving-ca/ca-bundle.crt found!
INFO[0031] found latest kube-apiserver: /etc/kubernetes/static-pod-resources/kube-apiserver-pod-9
INFO[0031] found latest kube-controller-manager: /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-6
INFO[0031] found latest kube-scheduler: /etc/kubernetes/static-pod-resources/kube-scheduler-pod-6
INFO[0031] found latest etcd: /etc/kubernetes/static-pod-resources/etcd-pod-2
INFO[0031] etcdctl is already installed
INFO[0031] etcdutl is already installed
INFO[0031] {"level":"info","ts":"2023-09-21T14:00:48.48003Z","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"/var/recovery/cluster/snapshot_2023-09-21_140048__POSSIBLY_DIRTY__.db.part"}
INFO[0031] {"level":"info","ts":"2023-09-21T14:00:48.490246Z","logger":"client","caller":"v3@v3.5.9/maintenance.go:212","msg":"opened snapshot stream; downloading"}
INFO[0031] {"level":"info","ts":"2023-09-21T14:00:48.49028Z","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"https://10.46.46.66:2379"}
INFO[0033] {"level":"info","ts":"2023-09-21T14:00:50.158759Z","logger":"client","caller":"v3@v3.5.9/maintenance.go:220","msg":"completed snapshot read; closing"}
INFO[0033] {"level":"info","ts":"2023-09-21T14:00:50.407955Z","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"https://10.46.46.66:2379","size":"115 MB","took":"1 second ago"}
INFO[0033] {"level":"info","ts":"2023-09-21T14:00:50.408049Z","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/var/recovery/cluster/snapshot_2023-09-21_140048__POSSIBLY_DIRTY__.db"}
INFO[0033] Snapshot saved at /var/recovery/cluster/snapshot_2023-09-21_140048__POSSIBLY_DIRTY__.db
INFO[0033] {"hash":1281395486,"revision":693323,"totalKey":7404,"totalSize":115171328}
INFO[0033] snapshot db and kube resources are successfully saved to /var/recovery/cluster
INFO[0034] Command succeeded: cp -Ra /etc/ /var/recovery/
INFO[0034] Command succeeded: cp -Ra /usr/local/ /var/recovery/
INFO[0099] Command succeeded: cp -Ra /var/lib/kubelet/ /var/recovery/
INFO[0099] tar: Removing leading `/' from member names
INFO[0099] tar: /var/lib/ovn-ic/etc/enable_dynamic_cpu_affinity: Cannot stat: No such file or directory
INFO[0099] tar: Exiting with failure status due to previous errors
INFO[0099] ##### Thu Sep 21 14:01:55 UTC 2023: Failed to backup additional managed files
ERRO[0099] exit status 1
Error: exit status 1
Usage:
upgrade-recovery launchBackup [flags]Flags:
-h, --help help for launchBackup
exit status 1
- blocks
-
OCPBUGS-19637 Cluster Backup Fails in upgrade-recovery.sh
-
- Closed
-
- is cloned by
-
OCPBUGS-19637 Cluster Backup Fails in upgrade-recovery.sh
-
- Closed
-
- links to
-
RHEA-2024:128510
OpenShift Container Platform 4.15.1 CNF vRAN extras update