Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: OADP 1.7.0
Affects Version/s: OADP 1.4.5
Component/s: fsbackup, restic, restore
Labels:

Activity Type:
Incidents & Support
Story Points:
3
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
QEStatus:
ToDo
Intelligence Requested:
Market:

Severity:
Important
Risk Probability:
Very Likely
Risk Score:
0
Customer Impact:

Customer Escalated, Customer Facing
Cost of Delay:
8

Workstream:

None

Root Cause:
Unset
Failure Category:
Unknown

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:

Description of problem:

With FSB mode, old PersistentVolumes (this cluster has a long history, OCP 4.8), have labels 'failure-domain.beta.kubernetes.io/region' and 'failure-domain.beta.kubernetes.io/zone' which are now deprecated in favor of 'nodeAffinity ... topology.disk.csi.azure.com/zone|region' labels.

When restoring a PV which has these old labels, they see, after restic restore, that velero-server adds these 'failure-domain' labels to the restored persistent volume while the final restored pod is up and running.

This only happens after restic completes, before, the PV doesn't have any labels, though it does have the topology nodeAffinity which is in the zone the pod, which was previously created during the initial phase of restoration, got scheduled onto.

This ends with a restored pod scheduled onto a node within Zone A for example, and with a PV with 'failure-domain' labels pointing to Zone B but with the original nodeAffinity pointing to node labels 'topology.disk.csi.azure.com/zone|region' on Zone A

If they restart the pod, it won't start because the volume now has these 'failure-domain' labels pointing to a different zone.
They have to remove the labels manually to allow the pod to start.

Version-Release number of selected component (if applicable):

OCP 4.18
OADP 1.4.5

How reproducible:
All the time in customer environment with PVs with old 'failure-domain' labels
Steps to Reproduce:
1. Have an OCP cluster with some history related to upgrades, in this case the cluster was installed initially on OCP 4.8 in Azure cloud. It might also reproduce by adding labels to PVs.
2. Have PVs created in previous versions where the 'failure-domain.beta.kubernetes.io' labels were added. Currently, those labels are deprecated in favor of 'topology.disk.csi.azure.com' ones
3. Backup and restore DeploymentConfigs using these volumes.

Actual results:

After pod restore, check the labels in the PV associated to it looking for old ones (failure-domain), restart the pod, it won't be able to start because velero adds old 'failure-domain' labels to the associated PV, pointing to a different zone than the one exposed in the nodeAffinity 'topology.disk.csi.azure.com' label in the restored PV.

Expected results:
Velero to skip this step where it adds labels after restore completes
Additional info:
Workaround is to either manually delete old labels from restored PV or remove old labels from source PV before backing them up.

resource-modifier solution didn't work because the pod to be restored doesn't have these old labels , and initially, the to be restored PV doesn't have the labels either, it is after restic completes the restore that velero add these labels

is related to

OADP-6699 Ignore OVN-K and multus annotations while backing-up/restoring pods

links to

https://github.com/vmware-tanzu/velero/issues/9358

Assignee:: Wes Hayutin

Reporter:: Javier Coscia

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/10/17 2:17 PM

Updated:: 2026/01/05 7:35 PM

Details

Description

Description of problem:

Actual results:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates