Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: 4.15
Affects Version/s: 4.14
Component/s: TALM Operator
Labels:

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
No

Target Backport Versions:
None
Target Version:

4.15.0
Release Blocker:
None
Sprint:
None

Internal Whiteboard:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Cluster backup consistently fails when a CGU is created with backup: true.

Version-Release number of selected component (if applicable):

TALM v4.14.0-62
OCP 4.14.0-rc.1

How reproducible:

Always

Steps to Reproduce:

1. Install hub cluster with OCP 4.14.0-rc.1
2. Install latest TALM on hub cluster
3. Provision managed cluster with OCP 4.14.0-rc.1
4. Create a CGU with backup: true 
5. Enable CGU
6. CGU fails with backup status: UnrecoverableError
7. View backup agent pod logs on managedcluster

Actual results:

Backup fails

Expected results:

Backup Should succeed.

Additional info:

[kni@registry auth]$ oc logs -n openshift-talo-backup backup-agent-jnt9p --follow
INFO[0002] Successfully remounted /host/sysroot with r/w permission 
INFO[0002] ------------------------------------------------------------ 
INFO[0002] Cleaning up old content...                   
INFO[0002] ------------------------------------------------------------ 
INFO[0002] 
fullpath: /var/recovery/upgrade-recovery.sh 
INFO[0002] 
fullpath: /var/recovery/cluster             
INFO[0002] 
fullpath: /var/recovery/etc.exclude.list    
INFO[0002] 
fullpath: /var/recovery/etc                 
INFO[0002] 
fullpath: /var/recovery/local               
INFO[0002] 
fullpath: /var/recovery/kubelet             
INFO[0025] 
fullpath: /var/recovery/extras.tgz          
INFO[0025] Old directories deleted with contents        
INFO[0025] Old contents have been cleaned up            
INFO[0031] Available disk space : 456.74 GiB; Estimated disk space required for backup: 32.28 GiB  
INFO[0031] Sufficient disk space found to trigger backup 
INFO[0031] Upgrade recovery script written              
INFO[0031] Running: bash -c /var/recovery/upgrade-recovery.sh --take-backup --dir /var/recovery 
INFO[0031] ##### Thu Sep 21 14:00:48 UTC 2023: Taking backup 
INFO[0031] ##### Thu Sep 21 14:00:48 UTC 2023: Wiping previous deployments and pinning active 
INFO[0031] error: Out of range deployment index 1, expected < 1 
INFO[0031] Deployment 0 is already pinned               
INFO[0031] ##### Thu Sep 21 14:00:48 UTC 2023: Backing up container cluster and required files 
INFO[0031] Certificate /etc/kubernetes/static-pod-certs/configmaps/etcd-serving-ca/ca-bundle.crt is missing. Checking in different directory 
INFO[0031] Certificate /etc/kubernetes/static-pod-resources/etcd-certs/configmaps/etcd-serving-ca/ca-bundle.crt found! 
INFO[0031] found latest kube-apiserver: /etc/kubernetes/static-pod-resources/kube-apiserver-pod-9 
INFO[0031] found latest kube-controller-manager: /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-6 
INFO[0031] found latest kube-scheduler: /etc/kubernetes/static-pod-resources/kube-scheduler-pod-6 
INFO[0031] found latest etcd: /etc/kubernetes/static-pod-resources/etcd-pod-2 
INFO[0031] etcdctl is already installed                 
INFO[0031] etcdutl is already installed                 
INFO[0031] {"level":"info","ts":"2023-09-21T14:00:48.48003Z","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"/var/recovery/cluster/snapshot_2023-09-21_140048__POSSIBLY_DIRTY__.db.part"} 
INFO[0031] {"level":"info","ts":"2023-09-21T14:00:48.490246Z","logger":"client","caller":"v3@v3.5.9/maintenance.go:212","msg":"opened snapshot stream; downloading"} 
INFO[0031] {"level":"info","ts":"2023-09-21T14:00:48.49028Z","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"https://10.46.46.66:2379"} 
INFO[0033] {"level":"info","ts":"2023-09-21T14:00:50.158759Z","logger":"client","caller":"v3@v3.5.9/maintenance.go:220","msg":"completed snapshot read; closing"} 
INFO[0033] {"level":"info","ts":"2023-09-21T14:00:50.407955Z","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"https://10.46.46.66:2379","size":"115 MB","took":"1 second ago"} 
INFO[0033] {"level":"info","ts":"2023-09-21T14:00:50.408049Z","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/var/recovery/cluster/snapshot_2023-09-21_140048__POSSIBLY_DIRTY__.db"} 
INFO[0033] Snapshot saved at /var/recovery/cluster/snapshot_2023-09-21_140048__POSSIBLY_DIRTY__.db 
INFO[0033] {"hash":1281395486,"revision":693323,"totalKey":7404,"totalSize":115171328} 
INFO[0033] snapshot db and kube resources are successfully saved to /var/recovery/cluster 
INFO[0034] Command succeeded: cp -Ra /etc/ /var/recovery/ 
INFO[0034] Command succeeded: cp -Ra /usr/local/ /var/recovery/ 
INFO[0099] Command succeeded: cp -Ra /var/lib/kubelet/ /var/recovery/ 
INFO[0099] tar: Removing leading `/' from member names  
INFO[0099] tar: /var/lib/ovn-ic/etc/enable_dynamic_cpu_affinity: Cannot stat: No such file or directory 
INFO[0099] tar: Exiting with failure status due to previous errors 
INFO[0099] ##### Thu Sep 21 14:01:55 UTC 2023: Failed to backup additional managed files 
ERRO[0099] exit status 1                                
Error: exit status 1
Usage:
  upgrade-recovery launchBackup [flags]Flags:
  -h, --help   help for launchBackup
exit status 1

blocks

OCPBUGS-19637 Cluster Backup Fails in upgrade-recovery.sh

Closed

is cloned by

OCPBUGS-19637 Cluster Backup Fails in upgrade-recovery.sh

Closed

links to

openshift-kni/cluster-group-upgrades-operator#675: OCPBUGS-19555: Add ovn-ic volumeMount to backup job template

RHEA-2024:128510 OpenShift Container Platform 4.15.1 CNF vRAN extras update

Assignee:: Sharat Akhoury

Reporter:: Joshua Clark

Need Info From:: None

Contributors:: None

QA Contact:: Joshua Clark

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2023/09/21 2:12 PM

Updated:: 2025/07/25 11:44 AM

Resolved:: 2024/10/31 5:21 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates