Loading...

Type: Bug
Resolution: Done
Priority: Minor
Fix Version/s: None
Affects Version/s: OADP 1.1.0
Component/s: csi-plugin, velero
Labels:
- triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
QEStatus:
ToDo

WSJF:
0
Risk Probability:
Very Likely
Risk Score:
0

Workstream:

None

Root Cause:
Unset
Failure Category:
Unknown

Regression:
No

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

running Restore using "csi" shows "COMPLETED" , although the pods pvc stuck on pending , " waiting for a volume to be created"

i have tried more then one pods types , with data , and with apps on sc : ocs-storagecluster-ceph-rbd
both of them failed on pvc claim step after the restore shows "completed"
the error : "waiting for a volume to be created, either by external provisioner "openshift-storage.rbd.csi.ceph.com" or manually created by system administrator"
there were no issue with pvc on pod creation for both types on pods.
attached the following files :
* velero-5d9dcf486b-g7wnt.log
* openshift-adp-controller-manager-5b859ccfc-m5c6d.log
* ocs-storagecluster-rbdplugin-snapclass.txt
* perf-project-1_events.log

backup resource

 apiVersion: velero.io/v1
kind: Backup
metadata:
  annotations:
    velero.io/source-cluster-k8s-gitversion: v1.23.5+012e945
    velero.io/source-cluster-k8s-major-version: "1"
    velero.io/source-cluster-k8s-minor-version: "23"
  creationTimestamp: "2022-08-17T11:48:08Z"
  generation: 8
  labels:
    velero.io/storage-location: example-velero-1
  name: latest-backup
  namespace: openshift-adp
  resourceVersion: "875684942"
  uid: 76a474ac-9768-4309-94d3-4ecc63a90dc7
spec:
  csiSnapshotTimeout: 10m0s
  defaultVolumesToRestic: false
  hooks: {}
  includedNamespaces:
  - perf-project-1
  metadata: {}
  storageLocation: example-velero-1
  ttl: 720h0m0s
status:
  completionTimestamp: "2022-08-17T11:49:09Z"
  csiVolumeSnapshotsAttempted: 4
  csiVolumeSnapshotsCompleted: 4
  expiration: "2022-09-16T11:48:08Z"
  formatVersion: 1.1.0
  phase: Completed
  progress:
    itemsBackedUp: 142
    totalItems: 142
  startTimestamp: "2022-08-17T11:48:08Z"
  version: 1

restore resource

 apiVersion: velero.io/v1
kind: Restore
metadata:
  creationTimestamp: "2022-08-17T11:53:50Z"
  generation: 8
  name: latest-restore
  namespace: openshift-adp
  resourceVersion: "875737797"
  uid: ce0f4e06-7f03-4715-9117-47ad7505a2f9
spec:
  backupName: latest-backup
  excludedResources:
  - nodes
  - events
  - events.events.k8s.io
  - backups.velero.io
  - restores.velero.io
  - resticrepositories.velero.io
  hooks: {}
  includedNamespaces:
  - '*'
status:
  completionTimestamp: "2022-08-17T11:54:10Z"
  phase: Completed
  progress:
    itemsRestored: 74
    totalItems: 74
  startTimestamp: "2022-08-17T11:53:50Z"
  warnings: 8

all the 4 pods in error state after "completed " restore

[root@f01-h07-000-r640 oadp-helpers]# oc get pods  -nperf-project-1
NAME                  READY   STATUS   RESTARTS   AGE
mariadb-1-deploy      0/1     Error    0          24m
mongodb-1-deploy      0/1     Error    0          24m
postgresql-1-deploy   0/1     Error    0          24m
redis-1-deploy        0/1     Error    0          24m

the error from the target ns pvc

 [root@f01-h07-000-r640 oadp-helpers]# oc get pvc -nperf-project-1
NAME         STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
mariadb      Pending                                      ocs-storagecluster-ceph-rbd   14m
mongodb      Pending                                      ocs-storagecluster-ceph-rbd   14m
postgresql   Pending                                      ocs-storagecluster-ceph-rbd   14m
redis        Pending                                      ocs-storagecluster-ceph-rbd   14m

events output from one app that is pending for pvc :

16m         Normal    Created                       pod/redis-1-deploy                       Created container deployment
16m         Normal    Started                       pod/redis-1-deploy                       Started container deployment
16m         Warning   FailedScheduling              pod/redis-1-srs76                        0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.
9m25s       Warning   FailedScheduling              pod/redis-1-srs76                        0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.
6m47s       Warning   FailedScheduling              pod/redis-1-srs76                        skip schedule deleting pod: perf-project-1/redis-1-srs76
16m         Normal    SuccessfulCreate              replicationcontroller/redis-1            Created pod: redis-1-srs76
6m48s       Normal    SuccessfulDelete              replicationcontroller/redis-1            Deleted pod: redis-1-srs76
102s        Normal    ExternalProvisioning          persistentvolumeclaim/redis              waiting for a volume to be created, either by external provisioner "openshift-storage.rbd.csi.ceph.com" or manually created by system administrator

Version-Release number of selected component (if applicable):

tested and verify on oadp :

* iib293185
* iib294611

OCP : 4.10.23

cloud15

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

ocs-storagecluster-rbdplugin-snapclass.txt
0.2 kB
2022/08/17 12:25 PM
perf-project-1_events.log
10 kB
2022/08/17 12:25 PM
velero-5d9dcf486b-g7wnt.log
512 kB
2022/08/17 12:25 PM
openshift-adp-controller-manager-5b859ccfc-m5c6d.log
6.93 MB
2022/08/17 12:25 PM

Details

Description

Description of problem:

running Restore using "csi" shows "COMPLETED" , although the pods pvc stuck on pending , " waiting for a volume to be created"

backup resource

restore resource

all the 4 pods in error state after "completed " restore

events output from one app that is pending for pvc :

tested and verify on oadp :

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates