Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-7363

Backup Phase "Completed" despite Failed CSI Item Operations

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • OADP 1.5.4
    • velero
    • None
    • Quality / Stability / Reliability
    • 3
    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • None

      Description of problem:

      A backup using CSI plugin reports status a completed even though multiple Backup Item Operations for VolumeSnapshots failed with API errors.  No errors reported in velero logs. Attached velero logs below

       

      Name:         mysql-7618ea2b-f60c-11f0-93c7-0a58ac1101dd
      Namespace:    openshift-adp
      Labels:       velero.io/storage-location=ts-dpa-1
      Annotations:  velero.io/resource-timeout=10m0s
                    velero.io/source-cluster-k8s-gitversion=v1.32.10
                    velero.io/source-cluster-k8s-major-version=1
                    velero.io/source-cluster-k8s-minor-version=32
      
      Phase:  Completed
      
      
      Namespaces:
        Included:  test-oadp-548
        Excluded:  <none>
      
      Resources:
        Included:        *
        Excluded:        <none>
        Cluster-scoped:  auto
      
      Label selector:  <none>
      
      Or label selector:  <none>
      
      Storage Location:  ts-dpa-1
      
      Velero-Native Snapshot PVs:  auto
      Snapshot Move Data:          false
      Data Mover:                  velero
      
      TTL:  720h0m0s
      
      CSISnapshotTimeout:    10m0s
      ItemOperationTimeout:  4h0m0s
      
      Hooks:  <none>
      
      Backup Format Version:  1.1.0
      
      Started:    2026-01-20 14:30:54 +0000 UTC
      Completed:  2026-01-20 14:32:30 +0000 UTC
      
      Expiration:  2026-02-19 14:30:54 +0000 UTC
      
      Total items to be backed up:  48
      Items backed up:              48
      
      Backup Item Operations:
        Operation for volumesnapshots.snapshot.storage.k8s.io test-oadp-548/velero-mysql-data-nxfcj:
          Backup Item Action Plugin:  velero.io/csi-volumesnapshot-backupper
          Operation ID:               test-oadp-548/velero-mysql-data-nxfcj/2026-01-20T14:30:59Z
          Items to Update:
                            volumesnapshots.snapshot.storage.k8s.io test-oadp-548/velero-mysql-data-nxfcj
                            volumesnapshotcontents.snapshot.storage.k8s.io /snapcontent-8e4cff65-afe7-47b4-b260-d5501ee5a3a4
          Phase:            Failed
          Operation Error:  rpc error: code = Unknown desc = failed to get API group resources: unable to retrieve the complete list of server APIs: snapshot.storage.k8s.io/v1: Get "https://172.30.0.1:443/apis/snapshot.storage.k8s.io/v1": dial tcp 172.30.0.1:443: connect: connection refused
          Created:          2026-01-20 14:30:59 +0000 UTC
          Started:          2026-01-20 14:30:59 +0000 UTC
        Operation for volumesnapshots.snapshot.storage.k8s.io test-oadp-548/velero-mysql-data1-j2fq8:
          Backup Item Action Plugin:  velero.io/csi-volumesnapshot-backupper
          Operation ID:               test-oadp-548/velero-mysql-data1-j2fq8/2026-01-20T14:31:04Z
          Items to Update:
                            volumesnapshots.snapshot.storage.k8s.io test-oadp-548/velero-mysql-data1-j2fq8
                            volumesnapshotcontents.snapshot.storage.k8s.io /snapcontent-3ae5a696-d1e0-406a-ab92-099077013ddf
          Phase:            Failed
          Operation Error:  rpc error: code = Unknown desc = failed to get API group resources: unable to retrieve the complete list of server APIs: snapshot.storage.k8s.io/v1: Get "https://172.30.0.1:443/apis/snapshot.storage.k8s.io/v1": dial tcp 172.30.0.1:443: connect: connection refused
          Created:          2026-01-20 14:31:04 +0000 UTC
          Started:          2026-01-20 14:31:04 +0000 UTC
      Resource List:
        apps/v1/Deployment:
          - test-oadp-548/mysql
        apps/v1/ReplicaSet:
          - test-oadp-548/mysql-64c9d6466
        authorization.openshift.io/v1/RoleBinding:
          - test-oadp-548/system:deployers
          - test-oadp-548/system:image-builders
          - test-oadp-548/system:image-pullers
        discovery.k8s.io/v1/EndpointSlice:
          - test-oadp-548/mysql-h498m
        rbac.authorization.k8s.io/v1/RoleBinding:
          - test-oadp-548/system:deployers
          - test-oadp-548/system:image-builders
          - test-oadp-548/system:image-pullers
        snapshot.storage.k8s.io/v1/VolumeSnapshot:
          - test-oadp-548/velero-mysql-data-nxfcj
          - test-oadp-548/velero-mysql-data1-j2fq8
        snapshot.storage.k8s.io/v1/VolumeSnapshotClass:
          - example-snapclass
        snapshot.storage.k8s.io/v1/VolumeSnapshotContent:
          - snapcontent-3ae5a696-d1e0-406a-ab92-099077013ddf
          - snapcontent-8e4cff65-afe7-47b4-b260-d5501ee5a3a4
        v1/ConfigMap:
          - test-oadp-548/kube-root-ca.crt
          - test-oadp-548/openshift-service-ca.crt
        v1/Endpoints:
          - test-oadp-548/mysql
        v1/Event:
          - test-oadp-548/mysql-64c9d6466-jtclk.188c76d138e47001
          - test-oadp-548/mysql-64c9d6466-jtclk.188c76d1c39a3f08
          - test-oadp-548/mysql-64c9d6466-jtclk.188c76d1c6c172eb
          - test-oadp-548/mysql-64c9d6466-jtclk.188c76d3784d8eb0
          - test-oadp-548/mysql-64c9d6466-jtclk.188c76d379baa2fb
          - test-oadp-548/mysql-64c9d6466-jtclk.188c76d37ec9fbb9
          - test-oadp-548/mysql-64c9d6466-jtclk.188c76d37f40d16e
          - test-oadp-548/mysql-64c9d6466.188c76d0847448b7
          - test-oadp-548/mysql-data.188c76d0794ab82f
          - test-oadp-548/mysql-data.188c76d08595fa44
          - test-oadp-548/mysql-data.188c76d085acf1e7
          - test-oadp-548/mysql-data.188c76d103df9a5f
          - test-oadp-548/mysql-data1.188c76d07e4055c9
          - test-oadp-548/mysql-data1.188c76d08630d7e5
          - test-oadp-548/mysql-data1.188c76d086331235
          - test-oadp-548/mysql-data1.188c76d103fb5287
          - test-oadp-548/mysql.188c76d083da6ae3
        v1/Namespace:
          - test-oadp-548
        v1/PersistentVolume:
          - pvc-53b2ef2b-d99e-40d3-8a64-0bf385d5c75e
          - pvc-b7abb41f-f2ce-402e-9922-bd77a6afc283
        v1/PersistentVolumeClaim:
          - test-oadp-548/mysql-data
          - test-oadp-548/mysql-data1
        v1/Pod:
          - test-oadp-548/mysql-64c9d6466-jtclk
        v1/Secret:
          - test-oadp-548/builder-dockercfg-g4mbp
          - test-oadp-548/default-dockercfg-2wm2v
          - test-oadp-548/deployer-dockercfg-wmf6v
          - test-oadp-548/mysql
        v1/Service:
          - test-oadp-548/mysql
        v1/ServiceAccount:
          - test-oadp-548/builder
          - test-oadp-548/default
          - test-oadp-548/deployer
      
      Backup Volumes:
        Velero-Native Snapshots: <none included>
      
        CSI Snapshots:
          test-oadp-548/mysql-data:
            Snapshot:
              Operation ID: test-oadp-548/velero-mysql-data-nxfcj/2026-01-20T14:30:59Z
              Snapshot Content Name: snapcontent-8e4cff65-afe7-47b4-b260-d5501ee5a3a4
              Storage Snapshot ID: snap-0c5e55a306dae4bae
              Snapshot Size (bytes): 2147483648
              CSI Driver: ebs.csi.aws.com
              Result: failed
          test-oadp-548/mysql-data1:
            Snapshot:
              Operation ID: test-oadp-548/velero-mysql-data1-j2fq8/2026-01-20T14:31:04Z
              Snapshot Content Name: snapcontent-3ae5a696-d1e0-406a-ab92-099077013ddf
              Storage Snapshot ID: snap-0f4c0513582beb9da
              Snapshot Size (bytes): 2147483648
              CSI Driver: ebs.csi.aws.com
              Result: failed
      
        Pod Volume Backups: <none included>
      
      HooksAttempted:  0
      HooksFailed:     0 

       

      Version-Release number of selected component (if applicable):
      OADP 1.5.4 (1.5.z)

       

      How reproducible:
      Difficult to reproduce as it only happens when API server is too much loaded. We have seen this only on AWS SNO cluster

       

      Steps to Reproduce:
      1.   Create a CSI backup on a loaded cluster
      2. Observe describe output

      Actual results:

      Backup is marked as complete

       

      Expected results:

      Backup should be partially failed as CSI plugin reported multiple errors. 

       

      Additional info:

        1. velero.log
          1.41 MB
          Prasad Joshi

              wnstb Wes Hayutin
              rhn-support-prajoshi Prasad Joshi
              Prasad Joshi Prasad Joshi
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: