-
Bug
-
Resolution: Unresolved
-
Critical
-
odf-4.16
-
None
Description of problem (please be detailed as possible and provide log
snippests):
The test cases given below failed due to two different errors in external mode cluster. Though this test cases are disruptive in nature, the errors occurred before starting any disruption in the cluster.
1. tests/functional/pv/pv_services/test_resource_deletion_during_pvc_pod_creation_deletion_and_io.py::TestResourceDeletionDuringMultipleCreateDeleteOperations::test_resource_deletion_during_pvc_pod_creation_deletion_and_io
The test case failed during fio on pod with RBD Block volume mode PVC.
Test case error:
ocs_ci.ocs.exceptions.CommandFailed: Error during execution of command: oc -n namespace-test-646e9cdc1ac041a3b822c5d9c rsh pod-test-rbd-ec7be2dff97f4a3fa6038ca1a18 fio --name=fio-rand-readwrite --filename=/dev/rbdblock --readwrite=randrw --bs=4K --direct=1 --numjobs=1 --time_based=1 --runtime=30 --size=2G --iodepth=4 --invalidate=0 --fsync_on_close=1 --rwmixread=75 --ioengine=libaio --rate=1m --rate_process=poisson --output-format=json.
Error is fio: io_u error on file /dev/rbdblock: I/O error: read offset=686010368, buflen=4096
fio: io_u error on file /dev/rbdblock: I/O error: read offset=1549148160, buflen=4096
fio: io_u error on file /dev/rbdblock: I/O error: read offset=886792192, buflen=4096
fio: io_u error on file /dev/rbdblock: I/O error: read offset=1520287744, buflen=4096
command terminated with exit code 1
2. After the fio error in the previous test case , the test cases that followed failed while creating RBD PVC (either Block or Filesystem volume mode).
- tests/functional/pv/pvc_clone/test_resource_deletion_during_pvc_clone.py::TestResourceDeletionDuringPvcClone::test_resource_deletion_during_pvc_clone
- tests/functional/pv/pvc_resize/test_resource_deletion_during_pvc_expansion.py::TestResourceDeletionDuringPvcExpansion::test_resource_deletion_during_pvc_expansion[rbdplugin]
- tests/functional/pv/pvc_resize/test_resource_deletion_during_pvc_expansion.py::TestResourceDeletionDuringPvcExpansion::test_resource_deletion_during_pvc_expansion[cephfsplugin]
- tests/functional/pv/pvc_resize/test_resource_deletion_during_pvc_expansion.py::TestResourceDeletionDuringPvcExpansion::test_resource_deletion_during_pvc_expansion[rbdplugin_provisioner]
Error from the test case test_resource_deletion_during_pvc_clone:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 76s default-scheduler Successfully assigned namespace-test-a3ed2aa514084ff6840faf7fb/pod-test-rbd-a59ee5fb88884539adb884322ef to compute-1
Normal SuccessfulAttachVolume 76s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-47995060-2825-439b-8fa1-a3e3af68833d"
Warning FailedMount 6s (x8 over 73s) kubelet MountVolume.MountDevice failed for volume "pvc-47995060-2825-439b-8fa1-a3e3af68833d" : rpc error: code = Internal desc = rbd: map failed with error an error (exit status 108) occurred while running rbd args: [--id csi-rbd-node -m 10.1.160.202:6789,10.1.160.201:6789,10.1.160.198:6789 --keyfile=***stripped*** map rbd/csi-vol-72a53628-18ba-4b5a-adfd-074add00b015 --device-type krbd --options noudev], rbd error output: rbd: sysfs write failed
rbd: map failed: (108) Cannot send after transport endpoint shutdown
The pod creations failed on the node compute-1. The pod in which fio failed was also on the node compute-1.
Test report with error details - https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/40443/testReport/
Must-gather logs collected after individual test failure - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-ext/jijoy-ext_20240731T011605/logs/failed_testcase_ocs_logs_1722434106/
Must-gather collected at the end of all tests - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-ext/jijoy-ext_20240731T011605/logs/testcases_1722434106/jijoy-ext/
==============================================================================
Version of all relevant components (if applicable):
Cluster Version 4.16.0-0.nightly-2024-07-30-181230
ODF 4.16.1-6
Ceph Version 18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e) reef (stable)
===============================================================================
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, I/O failure and error while creating pod with RBD PVC
Is there any workaround available to the best of your knowledge?
No
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1
Can this issue reproducible?
Yes
Can this issue reproduce from the UI?
If this is a regression, please provide more details to justify this:
These test cases passed in 4.16.0 and previous versions of ODF in external mode.
Steps to Reproduce:
(describing automated test steps)
1. Create an external mode ODF cluster
2. Create multiple PVCs of RBD block and filesystem volume mode with supported access modes. Create CephFS PVCS as well.
3. Attach the PVCs to pods. RWX PVC on more than 1 pod.
4. Run fio on pods.
(from next test)
5. Create new RBD PVC and attach it to app pod. Select node where the pod where fio failed in the step 4 was present.
To replicate the exact procedure, run the set of test cases
- tests/functional/pv/pv_services/test_resource_deletion_during_pvc_pod_creation_deletion_and_io.py
- tests/functional/pv/pvc_clone/test_resource_deletion_during_pvc_clone.py::TestResourceDeletionDuringPvcClone::test_resource_deletion_during_pvc_clone
- tests/functional/pv/pvc_resize/test_resource_deletion_during_pvc_expansion.py::TestResourceDeletionDuringPvcExpansion::test_resource_deletion_during_pvc_expansion[rbdplugin]
- tests/functional/pv/pvc_resize/test_resource_deletion_during_pvc_expansion.py::TestResourceDeletionDuringPvcExpansion::test_resource_deletion_during_pvc_expansion[cephfsplugin]
- tests/functional/pv/pvc_resize/test_resource_deletion_during_pvc_expansion.py::TestResourceDeletionDuringPvcExpansion::test_resource_deletion_during_pvc_expansion[rbdplugin_provisioner]
==============================================================================
Actual results:
In step 4, fio failed on RBD Block volume mode PVC.
In step 5, app pod creation failed.
Expected results:
fio and app pods creation should be successful
Additional info: