-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
False
-
None
-
True
-
-
We are demo.redhat.com, proving out MTV mass migration for customer workshops.
Our OCP+Virt clusters are running as guest VMs on OCP+Virt deployed on IBM bare metal.
We are using mostly external Ceph storage.
We have clusters available with similar failure conditions. Will provide access credentials when this issue is picked up.
Background from another cluster's attempt:
MTV deployment of 40 users (40 plans, 160 VMs) began to fail.
- OOM errors from the forklift-controller. forklift-controller in CrashLoopBackoff
- forklift-controller has two containers: main and inventory.
- main is failing with OOM
- inventory is OK
We are having storage performance issues. Look at what happened to User 30's experience here.
There's a pod, forklift-volume-populator-controller that complained about pvc 'pvc/vmexamples-user30/prime-992a2ced-f6cb-4e67-a445-d806f20656e5' in work queue no longer exists
PVC no longer exists? Why?
PVC plan-user30-vm-54068-r9kbn Error saving claim: Operation cannot be fulfilled on persistentvolumeclaims "plan-user30-vm-54068-r9kbn": the object has been modified; please apply your changes to the latest version and try again
I found the following in Alerts
When running VMs using ODF storage with 'rbd' mounter or 'rbd.csi.ceph.com provisioner', VMs may cause reports of bad crc/signature errors due to certain I/O patterns. Cluster performance can be severely degraded if the number of re-transmissions due to crc errors causes network saturation.100% of the rook-ceph-mgr-external/rook-ceph-mgr-external targets in Namespace NS openshift-storage namespace have been unreachable for more than 15 minutes. This may be a symptom of network connectivity issues, down nodes, or failures within these components. Assess the health of the infrastructure and nodes running these targets and then contact support.
the csi-rdbplugin-provisioner is disavowing all knowledge:
and then in the csi-addons :
I0814 13:56:03.470300 1 leaderelection.go:354] lock is held by csi-rbdplugin-provisioner-86679bf8fc-smnv6 and has not yet expired I0814 13:56:03.470331 1 leaderelection.go:255] failed to acquire lease openshift-storage/openshift-storage-rbd-csi-ceph-com-csi-addons
csi-rbdplugin-provisioner-86679bf8fc-q8xtv:
I0814 13:55:14.893726 1 controllerserver-default.go:42] ID: 21 Using default ControllerGetCapabilities
85I0814 13:55:14.893856 1 utils.go:205] ID: 21 GRPC response: {"capabilities":[{"Type":{"Rpc":
}},{"Type":{"Rpc":
{"type":5}}},{"Type":{"Rpc":
{"type":7}}},{"Type":{"Rpc":
{"type":9}}}]}
86I0814 13:55:44.624873 1 utils.go:198] ID: 22 GRPC call: /identity.Identity/GetCapabilities
87I0814 13:55:44.624910 1 utils.go:199] ID: 22 GRPC request: {}
88I0814 13:55:44.625032 1 utils.go:205] ID: 22 GRPC response: {"capabilities":[{"Type":{"Service":
}},{"Type":{"ReclaimSpace":
{"type":1}}},{"Type":{"NetworkFence":
{"type":1}}},{"Type":{"VolumeReplication":
{"type":1}}}]}
89I0814 14:01:50.291479 1 utils.go:198] ID: 23 Req-ID: pvc-762c7e55-3d70-4c48-bac9-82f7eb834f23 GRPC call: /csi.v1.Controller/CreateVolume
90I0814 14:01:50.291670 1 utils.go:199] ID: 23 Req-ID: pvc-762c7e55-3d70-4c48-bac9-82f7eb834f23 GRPC request: {"capacity_range":
,"name":"pvc-762c7e55-3d70-4c48-bac9-82f7eb834f23","parameters":
{"clusterID":"openshift-storage","csi.storage.k8s.io/pv/name":"pvc-762c7e55-3d70-4c48-bac9-82f7eb834f23","csi.storage.k8s.io/pvc/name":"prime-992a2ced-f6cb-4e67-a445-d806f20656e5","csi.storage.k8s.io/pvc/namespace":"vmexamples-user30","imageFeatures":"layering,deep-flatten,exclusive-lock,object-map,fast-diff","imageFormat":"2","pool":"ocpv01-tenants","volumeNamePrefix":"ocp4-cluster-vcch9-"},"secrets":"*{}stripped{}*","volume_capabilities":\\{"AccessType":{"Block":{}},"access_mode":{"mode":5}}}
91I0814 14:01:50.291910 1 rbd_util.go:1317] ID: 23 Req-ID: pvc-762c7e55-3d70-4c48-bac9-82f7eb834f23 setting disableInUseChecks: true image features: [object-map layering fast-diff deep-flatten exclusive-lock] mounter: rbd
92I0814 14:01:50.331000 1 omap.go:89] ID: 23 Req-ID: pvc-762c7e55-3d70-4c48-bac9-82f7eb834f23 got omap values: (pool="ocpv01-tenants", namespace="", name="csi.volumes.default"): map[]
93I0814 14:01:50.334528 1 omap.go:159] ID: 23 Req-ID: pvc-762c7e55-3d70-4c48-bac9-82f7eb834f23 set omap keys (pool="ocpv01-tenants", namespace="", name="csi.volumes.default"): map[csi.volume.pvc-762c7e55-3d70-4c48-bac9-82f7eb834f23:8588445b-d9fc-4e74-81e6-2894d583345e])
94I0814 14:01:50.335230 1 omap.go:159] ID: 23 Req-ID: pvc-762c7e55-3d70-4c48-bac9-82f7eb834f23 set omap keys (pool="ocpv01-tenants", namespace="", name="csi.volume.8588445b-d9fc-4e74-81e6-2894d583345e"): map[csi.imagename:ocp4-cluster-vcch9-8588445b-d9fc-4e74-81e6-2894d583345e csi.volname:pvc-762c7e55-3d70-4c48-bac9-82f7eb834f23 csi.volume.owner:vmexamples-user30])
95I0814 14:01:50.335251 1 rbd_journal.go:491] ID: 23 Req-ID: pvc-762c7e55-3d70-4c48-bac9-82f7eb834f23 generated Volume ID (0001-0011-openshift-storage-0000000000000017-8588445b-d9fc-4e74-81e6-2894d583345e) and image name (ocp4-cluster-vcch9-8588445b-d9fc-4e74-81e6-2894d583345e) for request name (pvc-762c7e55-3d70-4c48-bac9-82f7eb834f23)
96I0814 14:01:50.335294 1 rbd_util.go:425] ID: 23 Req-ID: pvc-762c7e55-3d70-4c48-bac9-82f7eb834f23 rbd: create ocpv01-tenants/ocp4-cluster-vcch9-8588445b-d9fc-4e74-81e6-2894d583345e size 21504M (features: [exclusive-lock fast-diff deep-flatten layering object-map]) using mon 10.38.85.193:6789,10.38.85.191:6789,10.38.85.192:6789
97I0814 14:01:50.335337 1 rbd_util.go:1565] ID: 23 Req-ID: pvc-762c7e55-3d70-4c48-bac9-82f7eb834f23 setting image options on ocpv01-tenants/ocp4-cluster-vcch9-8588445b-d9fc-4e74-81e6-2894d583345e
98I0814 14:01:50.356525 1 controllerserver.go:763] ID: 23 Req-ID: pvc-762c7e55-3d70-4c48-bac9-82f7eb834f23 created image ocpv01-tenants/ocp4-cluster-vcch9-8588445b-d9fc-4e74-81e6-2894d583345e backed for request name pvc-762c7e55-3d70-4c48-bac9-82f7eb834f23
99I0814 14:01:50.367413 1 omap.go:159] ID: 23 Req-ID: pvc-762c7e55-3d70-4c48-bac9-82f7eb834f23 set omap keys (pool="ocpv01-tenants", namespace="", name="csi.volume.8588445b-d9fc-4e74-81e6-2894d583345e"): map[csi.imageid:719a0fee6be261])
100I0814 14:01:50.367542 1 utils.go:205] ID: 23 Req-ID: pvc-762c7e55-3d70-4c48-bac9-82f7eb834f23 GRPC response: {"volume":{"capacity_bytes":22548578304,"volume_context":
,"volume_id":"0001-0011-openshift-storage-0000000000000017-8588445b-d9fc-4e74-81e6-2894d583345e"}}
PVC Describe
- oc describe pvc plan-user1-vm-53836-gn822 -n vmexamples-user1
Name: plan-user1-vm-53836-gn822
Namespace: vmexamples-user1
StorageClass: ocs-external-storagecluster-ceph-rbd
Status: Bound
Volume: pvc-c7f4a338-5e5e-4ed7-97d8-8c55a79a22e1
Labels: app=containerized-data-importer
app.kubernetes.io/component=storage
app.kubernetes.io/managed-by=cdi-controller
app.kubernetes.io/part-of=hyperconverged-cluster
app.kubernetes.io/version=4.16.1
migration=8c8570a0-10ef-457c-b18e-0bf261d1f19b
plan=cda3a759-5f36-42f7-b3d6-98aa624f02b8
vmID=vm-53836
Annotations: cdi.kubevirt.io/createdForDataVolume: f616c33c-7069-40c1-bcc2-3fa09105505e
cdi.kubevirt.io/storage.condition.running: false
cdi.kubevirt.io/storage.condition.running.message:
cdi.kubevirt.io/storage.condition.running.reason: Completed
cdi.kubevirt.io/storage.contentType: kubevirt
cdi.kubevirt.io/storage.deleteAfterCompletion: false
cdi.kubevirt.io/storage.pod.phase: Succeeded
cdi.kubevirt.io/storage.pod.restarts: 0
cdi.kubevirt.io/storage.populator.progress: 100.0%
cdi.kubevirt.io/storage.preallocation.requested: false
cdi.kubevirt.io/storage.usePopulator: true
forklift.konveyor.io/disk-source: [workload_share_QHFNI] database_24/database_2.vmdk
migration: 8c8570a0-10ef-457c-b18e-0bf261d1f19b
plan: cda3a759-5f36-42f7-b3d6-98aa624f02b8
pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
vmID: vm-53836
volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
volume.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
Finalizers: [kubernetes.io/pvc-protection
Capacity: 5Gi
Access Modes: RWX
VolumeMode: Block
DataSource:
APIGroup: cdi.kubevirt.io
Kind: VolumeImportSource
Name: volume-import-source-f616c33c-7069-40c1-bcc2-3fa09105505e
Used By: <none>
Events: <none>
PV Describe
- oc describe pvc plan-user1-vm-53836-gn822 -n vmexamples-user1
Name: plan-user1-vm-53836-gn822
Namespace: vmexamples-user1
StorageClass: ocs-external-storagecluster-ceph-rbd
Status: Bound
Volume: pvc-c7f4a338-5e5e-4ed7-97d8-8c55a79a22e1
Labels: app=containerized-data-importer
app.kubernetes.io/component=storage
app.kubernetes.io/managed-by=cdi-controller
app.kubernetes.io/part-of=hyperconverged-cluster
app.kubernetes.io/version=4.16.1
migration=8c8570a0-10ef-457c-b18e-0bf261d1f19b
plan=cda3a759-5f36-42f7-b3d6-98aa624f02b8
vmID=vm-53836
Annotations: cdi.kubevirt.io/createdForDataVolume: f616c33c-7069-40c1-bcc2-3fa09105505e
cdi.kubevirt.io/storage.condition.running: false
cdi.kubevirt.io/storage.condition.running.message:
cdi.kubevirt.io/storage.condition.running.reason: Completed
cdi.kubevirt.io/storage.contentType: kubevirt
cdi.kubevirt.io/storage.deleteAfterCompletion: false
cdi.kubevirt.io/storage.pod.phase: Succeeded
cdi.kubevirt.io/storage.pod.restarts: 0
cdi.kubevirt.io/storage.populator.progress: 100.0%
cdi.kubevirt.io/storage.preallocation.requested: false
cdi.kubevirt.io/storage.usePopulator: true
forklift.konveyor.io/disk-source: [workload_share_QHFNI] database_24/database_2.vmdk
migration: 8c8570a0-10ef-457c-b18e-0bf261d1f19b
plan: cda3a759-5f36-42f7-b3d6-98aa624f02b8
pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
vmID: vm-53836
volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
volume.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
Finalizers: [kubernetes.io/pvc-protection
Capacity: 5Gi
Access Modes: RWX
VolumeMode: Block
DataSource:
APIGroup: cdi.kubevirt.io
Kind: VolumeImportSource
Name: volume-import-source-f616c33c-7069-40c1-bcc2-3fa09105505e
Used By: <none>
Events: <none>
(migration_test_env) [root@utility load_test_mtv]# oc describe pv pvc-c7f4a338-5e5e-4ed7-97d8-8c55a79a22e1
Name: pvc-c7f4a338-5e5e-4ed7-97d8-8c55a79a22e1
Labels: <none>
Annotations: pv.kubernetes.io/provisioned-by: openshift-storage.rbd.csi.ceph.com
volume.kubernetes.io/provisioner-deletion-secret-name: rook-csi-rbd-provisioner
volume.kubernetes.io/provisioner-deletion-secret-namespace: openshift-storage
Finalizers: [external-provisioner.volume.kubernetes.io/finalizer kubernetes.io/pv-protection]
StorageClass: ocs-external-storagecluster-ceph-rbd
Status: Bound
Claim: vmexamples-user1/plan-user1-vm-53836-gn822
Reclaim Policy: Delete
Access Modes: RWX
VolumeMode: Block
Capacity: 5Gi
Node Affinity: <none>
Message:
Source:
Type: CSI (a Container Storage Interface (CSI) volume source)
Driver: openshift-storage.rbd.csi.ceph.com
FSType:
VolumeHandle: 0001-0011-openshift-storage-0000000000000017-98f7cacf-5626-48df-9fb0-8f402021c98b
ReadOnly: false
VolumeAttributes: clusterID=openshift-storage
imageFeatures=layering,deep-flatten,exclusive-lock,object-map,fast-diff
imageFormat=2
imageName=ocp4-cluster-vcch9-98f7cacf-5626-48df-9fb0-8f402021c98b
journalPool=ocpv01-tenants
pool=ocpv01-tenants
storage.kubernetes.io/csiProvisionerIdentity=1723643714412-1686-openshift-storage.rbd.csi.ceph.com
volumeNamePrefix=ocp4-cluster-vcch9-
Events: <none>