Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-7542

Velero backup for Windows VM creates CSI VolumeSnapshot sequentially causing VSS freeze timeout resulting in application-inconsistent backups

XMLWordPrintable

    • Quality / Stability / Reliability
    • None
    • 3
    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • Important
    • Very Likely
    • 0
    • Customer Escalated, Customer Facing, Customer Reported
    • 8
    • Unset
    • Unknown
    • None

      When using OADP ( velero ) for Openshift Virtualization VM backup with CSI. It is observed that for Windows VM with multiple Disks, the velero requests CSI to create VolumeSnapshot sequentially exceeding the VSS 10 Seconds timeout for Shadow copy Creation, resulting in a application inconsistent Backup.

      DPA configuration

       

      apiVersion: oadp.openshift.io/v1alpha1
      kind: DataProtectionApplication
      metadata:
        annotations:
      xxxxxx-adp:oadp.openshift.io/DataProtectionApplication:openshift-adp/vm-backups'
        creationTimestamp: '2025-06-24T06:49:01Z'
        generation: 1
        backupLocations:
          - velero:
              credential:
                key: cloud
                name: cloud-credentials
              default: true
              objectStorage:
                bucket: xxxxx-backup
                prefix: velero
              provider: aws
        configuration:
          nodeAgent:
            enable: true
            uploaderType: restic
          velero:
            defaultPlugins:
              - openshift
              - aws
              - kubevirt
            featureFlags:
              - EnableCSI
      status:
        conditions:
          - lastTransitionTime: '2026-02-12T14:22:20Z'
            message: Reconcile complete
            reason: Complete
            status: 'True'
            type: Reconciled 

      Observed Behavior.

      During VM backup:
      1.) Velero executes the freeze hook using virt-freezer. qemu-guest-agent communicates with VSS and performes the OS freeze. 

      virt-freezer --freeze

      2.) Velero identifies Disk/pvc attached to the VM

      3.) Velero requests CSI to create  VolumeSnapshot sequentially for each Disk

      4.) Each VolumeSnapshot waits for CSI reconciliation before the next snapshot starts

      5.) Total freeze duration exceeds the Windows VSS freeze limit (~10 seconds).

      6.) When Velero executes the unfreeze hook

      virt-freezer --unfreeze

      the QEMU guest agent returns:

      "Failed to unfreeze vmi, virError(Code=1, Domain=10, Message='internal error: unable to execute QEMU agent command 'guest-fsfreeze-thaw': couldn't hold writes: fsfreeze is limited up to 10 seconds

      Actual Result

      Although all VolumeSnapshot are successfully created and marked as ready to use. The VolumeSnapshot was created when the OS was unfreezed and write operation was resumed by VSS resulting in a Application inconsistent backup. 

      Expected Result

      VolumeSnapshot should be completed within the 10 Seconds VSS timeout to create a Application consistent backup

              sseago Scott Seago
              rhn-support-kshukla Koutuk Shukla
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: