-
Bug
-
Resolution: Unresolved
-
Critical
-
odf-4.13
Description of problem (please be detailed as possible and provide log
snippests):
In IBMCloud ROKS cluster, we were validating the multiple deviceset features and are observing inconsistency in OSD pod scheduling. We are following this article to create devicesets
https://access.redhat.com/articles/6214381
This issue has been observed on both 4.13 & 4.14 ROKS clusters with the respective latest version of ODF.
We have created a ROKS clusters with 3 workers of flavors 16x64G initially from the IBMCloud and after the cluster creation, have installed our addon to install the ODF. This by default installs ODF by creating a single deviceset with name "ocs-deviceset" and storage class as "ibmc-vpc-block-metro-10iops-tier" and all the OSD pods are evenly spread across the available workers.
##########################################
- config: {}
count: 1
dataPVCTemplate:
metadata: {}
spec:
accessModes: - ReadWriteOnce
resources:
requests:
storage: 512Gi
storageClassName: ibmc-vpc-block-metro-10iops-tier
volumeMode: Block
status: {}
name: ocs-deviceset
placement: {}
portable: true
preparePlacement: {}
replica: 3
resources: {}
###########################################
Work> oc get no -owide -l ibm-cloud.kubernetes.io/worker-pool-name=default
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
10.241.0.7 Ready master,worker 8h v1.26.9+aa37255 10.241.0.7 10.241.0.7 Red Hat Enterprise Linux 8.8 (Ootpa) 4.18.0-477.27.1.el8_8.x86_64 cri-o://1.26.4-5.1.rhaos4.13.git969e013.el8
10.241.128.7 Ready master,worker 8h v1.26.9+aa37255 10.241.128.7 10.241.128.7 Red Hat Enterprise Linux 8.8 (Ootpa) 4.18.0-477.27.1.el8_8.x86_64 cri-o://1.26.4-5.1.rhaos4.13.git969e013.el8
10.241.64.6 Ready master,worker 8h v1.26.9+aa37255 10.241.64.6 10.241.64.6 Red Hat Enterprise Linux 8.8 (Ootpa) 4.18.0-477.27.1.el8_8.x86_64 cri-o://1.26.4-5.1.rhaos4.13.git969e013.el8
##########################################
rook-ceph-osd-0-794885b46f-c2dx8 2/2 Running 0 7h39m 172.17.89.250 10.241.64.6 <none> <none>
rook-ceph-osd-1-8699d65d57-88z2g 2/2 Running 0 7h39m 172.17.66.223 10.241.128.7 <none> <none>
rook-ceph-osd-2-6b48c9b99-k8tb6 2/2 Running 0 7h38m 172.17.68.230 10.241.0.7 <none> <none>
##########################################
Lets add another deviceset by editing the storagecluster cr as per the above article except that the deviceClass parameter is not added with storage class as "ibmc-vpc-block-metro-5iops-tier". In this case, the OSD pods are getting scheduled on the above listed nodes and are being spread across the zones.
##########################################
- config: {}
count: 1
dataPVCTemplate:
metadata: {}
spec:
accessModes: - ReadWriteOnce
resources:
requests:
storage: 512Gi
storageClassName: ibmc-vpc-block-metro-5iops-tier
volumeMode: Block
status: {}
name: ocs-deviceset-2
placement: {}
portable: true
preparePlacement: {}
replica: 3
resources: {}
##########################################
rook-ceph-osd-3-549df4f77d-l7w5s 2/2 Running 0 7h8m 172.17.89.249 10.241.64.6 <none> <none>
rook-ceph-osd-4-56464899-qk2bl 2/2 Running 0 7h8m 172.17.66.232 10.241.128.7 <none> <none>
rook-ceph-osd-5-7bb8c4b8c4-zszfr 2/2 Running 0 7h7m 172.17.68.238 10.241.0.7 <none> <none>
##########################################
Now create a worker pool of 3 workers from IBMCloud UI with name "deviceset-3" and add the following labels. Lets create another deviceset with deviceClass as "deviceset-3", storage class as "ibmc-vpc-block-metro-5iops-tier" and placement policies as well. In this case, the OSD pods are either gets scheduled across any 2 zones or on any one of the workers based on the affinity condition.
##########################################
cluster.ocs.openshift.io/openshift-storage: ""
cluster.ocs.openshift.io/openshift-storage-device-class: deviceset-3
##########################################
- config: {}
count: 1
dataPVCTemplate:
metadata: {}
spec:
accessModes: - ReadWriteOnce
resources:
requests:
storage: 512Gi
storageClassName: ibmc-vpc-block-metro-5iops-tier
volumeMode: Block
status: {}
deviceClass: deviceset-3
name: ocs-deviceset-3
placement:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms: - matchExpressions:
- key: cluster.ocs.openshift.io/openshift-storage-device-class
operator: In
values: - deviceset-3
##########################################
Work> oc get no -owide -l ibm-cloud.kubernetes.io/worker-pool-name=deviceset-3
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
10.241.0.9 Ready master,worker 7h23m v1.26.9+aa37255 10.241.0.9 10.241.0.9 Red Hat Enterprise Linux 8.8 (Ootpa) 4.18.0-477.27.1.el8_8.x86_64 cri-o://1.26.4-5.1.rhaos4.13.git969e013.el8
10.241.128.12 Ready master,worker 7h23m v1.26.9+aa37255 10.241.128.12 10.241.128.12 Red Hat Enterprise Linux 8.8 (Ootpa) 4.18.0-477.27.1.el8_8.x86_64 cri-o://1.26.4-5.1.rhaos4.13.git969e013.el8
10.241.64.11 Ready master,worker 7h23m v1.26.9+aa37255 10.241.64.11 10.241.64.11 Red Hat Enterprise Linux 8.8 (Ootpa) 4.18.0-477.27.1.el8_8.x86_64 cri-o://1.26.4-5.1.rhaos4.13.git969e013.el8
##########################################
rook-ceph-osd-6-6b456f7844-jp4x2 2/2 Running 0 6h14m 172.17.110.209 10.241.64.11 <none> <none>
rook-ceph-osd-7-55b98ff548-v4rsh 2/2 Running 0 6h14m 172.17.110.212 10.241.64.11 <none> <none>
rook-ceph-osd-8-b45474c5f-6vnqv 2/2 Running 0 6h13m 172.17.110.214 10.241.64.11 <none> <none>
##########################################
Same steps as previous scenario but with different storage class "ibmc-vpc-block-metro-general-purpose". In this case, the OSD pods are all distributed across zone as expected
##########################################
- config: {}
count: 1
dataPVCTemplate:
metadata: {}
spec:
accessModes: - ReadWriteOnce
resources:
requests:
storage: 512Gi
storageClassName: ibmc-vpc-block-metro-general-purpose
volumeMode: Block
status: {}
deviceClass: deviceset-4
name: ocs-deviceset-4
placement:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms: - matchExpressions:
- key: cluster.ocs.openshift.io/openshift-storage-device-class
operator: In
values: - deviceset-4
portable: true
preparePlacement: {}
replica: 3
resources: {}
##########################################
Work> oc get no -owide -l ibm-cloud.kubernetes.io/worker-pool-name=deviceset-4
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
10.241.0.10 Ready master,worker 7h57m v1.26.9+aa37255 10.241.0.10 10.241.0.10 Red Hat Enterprise Linux 8.8 (Ootpa) 4.18.0-477.27.1.el8_8.x86_64 cri-o://1.26.4-5.1.rhaos4.13.git969e013.el8
10.241.128.13 Ready master,worker 7h56m v1.26.9+aa37255 10.241.128.13 10.241.128.13 Red Hat Enterprise Linux 8.8 (Ootpa) 4.18.0-477.27.1.el8_8.x86_64 cri-o://1.26.4-5.1.rhaos4.13.git969e013.el8
10.241.64.12 Ready master,worker 7h57m v1.26.9+aa37255 10.241.64.12 10.241.64.12 Red Hat Enterprise Linux 8.8 (Ootpa) 4.18.0-477.27.1.el8_8.x86_64 cri-o://1.26.4-5.1.rhaos4.13.git969e013.el8
##########################################
rook-ceph-osd-9-67dd868dc8-jhw4q 2/2 Running 0 4h56m 172.17.116.72 10.241.128.13 <none> <none>
rook-ceph-osd-10-54d5b69df5-mvvzj 2/2 Running 0 4h56m 172.17.125.8 10.241.64.12 <none> <none>
rook-ceph-osd-11-548ff94bdb-sp7cv 2/2 Running 0 4h56m 172.17.75.137 10.241.0.10 <none> <none>
##########################################
Same step as previous scenario with same storage class "ibmc-vpc-block-metro-general-purpose". In this case, the OSD pods are distributed unevenly with 2 OSDs on same zone.
##########################################
- config: {}
count: 1
dataPVCTemplate:
metadata: {}
spec:
accessModes: - ReadWriteOnce
resources:
requests:
storage: 512Gi
storageClassName: ibmc-vpc-block-metro-general-purpose
volumeMode: Block
status: {}
deviceClass: deviceset-5
name: ocs-deviceset-5
placement:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms: - matchExpressions:
- key: cluster.ocs.openshift.io/openshift-storage-device-class
operator: In
values: - deviceset-5
portable: true
preparePlacement: {}
replica: 3
resources: {}
##########################################
Work> oc get no -owide -l ibm-cloud.kubernetes.io/worker-pool-name=deviceset-5
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
10.241.0.11 Ready master,worker 6h30m v1.26.9+aa37255 10.241.0.11 10.241.0.11 Red Hat Enterprise Linux 8.8 (Ootpa) 4.18.0-477.27.1.el8_8.x86_64 cri-o://1.26.4-5.1.rhaos4.13.git969e013.el8
10.241.128.14 Ready master,worker 6h30m v1.26.9+aa37255 10.241.128.14 10.241.128.14 Red Hat Enterprise Linux 8.8 (Ootpa) 4.18.0-477.27.1.el8_8.x86_64 cri-o://1.26.4-5.1.rhaos4.13.git969e013.el8
10.241.64.13 Ready master,worker 6h30m v1.26.9+aa37255 10.241.64.13 10.241.64.13 Red Hat Enterprise Linux 8.8 (Ootpa) 4.18.0-477.27.1.el8_8.x86_64 cri-o://1.26.4-5.1.rhaos4.13.git969e013.el8
##########################################
rook-ceph-osd-12-6fc6c68645-cwdwz 2/2 Running 0 4h1m 172.17.91.201 10.241.64.13 <none> <none>
rook-ceph-osd-13-6f6cb46d4f-55xsz 2/2 Running 0 4h1m 172.17.91.203 10.241.64.13 <none> <none>
rook-ceph-osd-14-7988b69947-csrkl 2/2 Running 0 4h 172.17.103.72 10.241.0.11 <none> <none>
##########################################
Version of all relevant components (if applicable):
Latest ODF 4.13 & 4.14
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
We are assessing the multiple deviceset feature for customers from ODF on IBMCloud
Is there any workaround available to the best of your knowledge?
If we include PodAntiAffinity rules on OSD Prepare jobs, OSD pods are being scheduled as expected
##############################################
placement:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cluster.ocs.openshift.io/openshift-storage-device-class
operator: In
values: - deviceset-8
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm:
labelSelector:
matchExpressions: - key: app
operator: In
values: - rook-ceph-osd
- rook-ceph-osd-prepare
topologyKey: topology.kubernetes.io/zone
weight: 100
##############################################
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3
Can this issue reproducible?
Yes, Tried in 2 clusters in 2 different env.
4.13 on Prod Env
4.14 on Internal Stage Env
Can this issue reproduce from the UI?
NA
If this is a regression, please provide more details to justify this:
Steps to Reproduce:
All scenarios are detailed in the description
Actual results:
OSD pods should be scheduled across nodes from different zones
Expected results:
Works only when SC are different in device sets.
Additional info:
NA