-
Bug
-
Resolution: Won't Do
-
Minor
-
None
-
odf-4.14
-
None
+++ This bug was initially created as a clone of Bug #2269099 +++
Description of problem (please be detailed as possible and provide log
snippests):
- During ODF installation, rook-ceph-operator failed to run osd provisioning job due to Invalid value: "ocs-deviceset-smrj-local-volume-storageclass-0-data-0kxvcr-bridge": must be no more than 63 characters error [1].
$ pvc="ocs-deviceset-smrj-local-volume-storageclass-0-data-0kxvcr-bridge"
$ echo ${#pvc}
65
- Customer hit this error by following installation steps in "Chapter 3. Deploy using local storage devices" [2].
- In step 4. In the Create local volume set page, provide the following information:
Enter a name for the LocalVolumeSet and the StorageClass.
Customer entered StorageClass: "smrj-local-volume-storageclass" which length is 30 and resulted in an error.
- There is name requirement help for StorageClass Name menu by clicking "i" on the right.
It appears 'No more than 253 characters' [3], 253 must be 28 'No more than 28 characters'.
[1]
$ omc logs rook-ceph-operator-6b86fc8fc4-qvkv8 -n openshift-storage | grep "E | op-osd"
2024-03-07T13:35:27.892559450+09:00 2024-03-07 04:35:27.892528 E | op-osd: failed to run osd provisioning job for PVC "ocs-deviceset-smrj-local-volume-storageclass-0-data-0kxvcr": Job.batch "rook-ceph-osd-prepare-8dba0c8fcaa117dc4c6a006f80aaa291" is invalid: [spec.template.spec.volumes[10].name: Invalid value: "ocs-deviceset-smrj-local-volume-storageclass-0-data-0kxvcr-bridge": must be no more than 63 characters, spec.template.spec.containers[0].volumeMounts[9].name: Not found: "ocs-deviceset-smrj-local-volume-storageclass-0-data-0kxvcr-bridge", spec.template.spec.initContainers[1].volumeMounts[0].name: Not found: "ocs-deviceset-smrj-local-volume-storageclass-0-data-0kxvcr-bridge"]
Version of all relevant components (if applicable):
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
No
Is there any workaround available to the best of your knowledge?
Yes
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1
Can this issue reproducible?
Yes
Can this issue reproduce from the UI?
Yes
If this is a regression, please provide more details to justify this:
Steps to Reproduce:
1. Install ODF from GUI
2. Enter storageclassname larger than 28 characters.
3.
Actual results:
Allow entering storageclass name larger than 28 chars.
Expected results:
Issue warning or don't allow storageclass name larger than 28 chars.
Additional info:
— Additional comment from RHEL Program Management on 2024-03-12 04:03:07 UTC —
This bug having no release flag set previously, is now set with release flag 'odf‑4.15.0' to '?', and so is being proposed to be fixed at the ODF 4.15.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag.
— Additional comment from Kenichiro Kagoshima on 2024-03-12 04:05:14 UTC —
Version of all relevant components (if applicable):
4.14
— Additional comment from Kenichiro Kagoshima on 2024-03-12 04:05:24 UTC —
Version of all relevant components (if applicable):
4.14
— Additional comment from Sanjal Katiyar on 2024-03-12 07:10:58 UTC —
We may restrict UI to input at max 28 characters (instead of current 253), but please note that:
1. As per the current UX (step 4i. under "3.3. Creating OpenShift Data Foundation cluster on VMware vSphere" [2]) we will have to restrict characters for LocalVolumeSet name as well.
2. This issue can still occur via CLI as both K8s api-server (https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names) and LSO won't restrict this while StorageClass or LocalVolumeSet creation respectively.
AFAIK "ocs-deviceset-*" PVCs are owned by CephCluster, so adding a need info on the rook team, just to cross-confirm that "28" characters StorageClass name will indeed work for all cases (for PVCs or any other resources which rook might be creating using the name of the LSO StorageClass).
— Additional comment from Travis Nielsen on 2024-03-12 13:50:56 UTC —
(In reply to Sanjal Katiyar from comment #4)
> We may restrict UI to input at max 28 characters (instead of current 253),
> but please note that:
>
> 1. As per the current UX (step 4i. under "3.3. Creating OpenShift Data
> Foundation cluster on VMware vSphere" [2]) we will have to restrict
> characters for LocalVolumeSet name as well.
>
> 2. This issue can still occur via CLI as both K8s api-server
> (https://kubernetes.io/docs/concepts/overview/working-with-objects/names/
> #names) and LSO won't restrict this while StorageClass or LocalVolumeSet
> creation respectively.
>
>
> AFAIK "ocs-deviceset-*" PVCs are owned by CephCluster, so adding a need info
> on the rook team, just to cross-confirm that "28" characters StorageClass
> name will indeed work for all cases (for PVCs or any other resources which
> rook might be creating using the name of the LSO StorageClass).
The PVCs generated by ODF/Rook don't depend on the length of the storage class name, so the length of the LSO storage class should not affect us adversely.
— Additional comment from Sanjal Katiyar on 2024-04-05 07:49:08 UTC —
Had a bit of offline discussion here: https://ibm-systems-storage.slack.com/archives/C06EY7A3A2C/p1710253576536249. I also tried to repro this out of curiosity and issue is easily reproducible.
OSD pods did not get created and corresponding PVCs are in "Pending" state. Did not find anything relevant in ODF/OCS logs, just Rook logs:
--------------------------------------------------------------------------------------------------------
2024-04-05 07:12:30.354803 E | ceph-cluster-controller: failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to create cluster: failed to start ceph osds: 3 failures encountered while running osds on nodes in namespace "openshift-storage".
2239failed to run osd provisioning job for PVC "ocs-deviceset-smrj-local-volume-storageclass-0-data-0rxgpx": Job.batch "rook-ceph-osd-prepare-4436b2c9cb194f7bc97c816ea374b9ee" is invalid: [spec.template.spec.volumes[10].name: Invalid value: "ocs-deviceset-smrj-local-volume-storageclass-0-data-0rxgpx-bridge": must be no more than 63 characters, spec.template.spec.containers[0].volumeMounts[9].name: Not found: "ocs-deviceset-smrj-local-volume-storageclass-0-data-0rxgpx-bridge", spec.template.spec.initContainers[1].volumeMounts[0].name: Not found: "ocs-deviceset-smrj-local-volume-storageclass-0-data-0rxgpx-bridge"]
2240failed to run osd provisioning job for PVC "ocs-deviceset-smrj-local-volume-storageclass-1-data-0jngk6": Job.batch "rook-ceph-osd-prepare-59592a8761c92bd4824e96a21cf7170c" is invalid: [spec.template.spec.volumes[10].name: Invalid value: "ocs-deviceset-smrj-local-volume-storageclass-1-data-0jngk6-bridge": must be no more than 63 characters, spec.template.spec.containers[0].volumeMounts[9].name: Not found: "ocs-deviceset-smrj-local-volume-storageclass-1-data-0jngk6-bridge", spec.template.spec.initContainers[1].volumeMounts[0].name: Not found: "ocs-deviceset-smrj-local-volume-storageclass-1-data-0jngk6-bridge"]
2241failed to run osd provisioning job for PVC "ocs-deviceset-smrj-local-volume-storageclass-2-data-06wm67": Job.batch "rook-ceph-osd-prepare-0479de3d43faccb04f5209855ec88a81" is invalid: [spec.template.spec.volumes[10].name: Invalid value: "ocs-deviceset-smrj-local-volume-storageclass-2-data-06wm67-bridge": must be no more than 63 characters, spec.template.spec.containers[0].volumeMounts[9].name: Not found: "ocs-deviceset-smrj-local-volume-storageclass-2-data-06wm67-bridge", spec.template.spec.initContainers[1].volumeMounts[0].name: Not found: "ocs-deviceset-smrj-local-volume-storageclass-2-data-06wm67-bridge"]
--------------------------------------------------------------------------------------------------------
We can limit StorageClass name from the UI however this has its own limitations (comment#c4):
1. We are not really resolving root of this issue by adding UI validation, CLI can still encounter this.
2. K8s allow StorageClass name to be greater than 28 characters, so it is not the best user experience to limit it in our case.
For now I am moving this BZ to rook for further exploration, to check if it is possible to trim the name of the resources which gets created pre/post-fixed with the LSO StorageClass name. If it not possible to fix this on backend, then as a last resort we can fix it from the UI (with above mentioned limitations).
— Additional comment from Subham Rai on 2024-04-08 06:03:03 UTC —
IIRC, The storageClass is created and that is passed to rook for creating the PVC, so rook doesn't control the storageClassName limit. So probably it has to be managed by UI or the user when done via CLI.
wdyt @tnielsen@redhat.com
— Additional comment from Sanjal Katiyar on 2024-04-08 06:14:31 UTC —
(In reply to Subham Rai from comment #7)
> IIRC, The storageClass is created and that is passed to rook for creating
> the PVC, so rook doesn't control the storageClassName limit. So probably it
> has to be managed by UI or the user when done via CLI.
Just to point out, request in comment#6 is to control the name limit of the resources which rook creates (like PVC) using the name of StorageClass (as pre/post-fix), not the name of the StorageClass itself (yes, UI/CLI creates this StorageClass)...
— Additional comment from Subham Rai on 2024-04-08 06:43:15 UTC —
(In reply to Sanjal Katiyar from comment #8)
> (In reply to Subham Rai from comment #7)
> > IIRC, The storageClass is created and that is passed to rook for creating
> > the PVC, so rook doesn't control the storageClassName limit. So probably it
> > has to be managed by UI or the user when done via CLI.
>
> Just to point out, request in comment#6 is to control the name limit of the
> resources which rook creates (like PVC) using the name of StorageClass (as
> pre/post-fix), not the name of the StorageClass itself (yes, UI/CLI creates
> this StorageClass)...
That I guess if required, Rook will be able to limit that or truncate the name.
— Additional comment from Sanjal Katiyar on 2024-04-08 16:40:44 UTC —
> (yes, UI/CLI creates this StorageClass)...
I would like to correct myself a bit here: UI/CLI just pass down the StorageClass name to the StorageCluster CR (not "create" it). Here is the snippet:
...
storageDeviceSets: [
{
name: `ocs-deviceset-${storageClassName}`,
...
dataPVCTemplate: {
spec:
,
},
},
]
...
— Additional comment from Sanjal Katiyar on 2024-04-08 16:49:39 UTC —
And what it seems to me now (someone from rook can plz confirm this) that rook uses deviceSet name as a prefix to create PVCs not StorageClass name directly.
That said, request in comment#c6 & comment#c8 still remains same, but instead of truncating StorageClass name, maybe we can look into truncating deviceSet name (if possible).
— Additional comment from Travis Nielsen on 2024-04-08 22:57:54 UTC —
If Rook truncates the device set name, then we run the risk of having duplicate names. For example, if the device sets are named with a numerical suffix, we might incorrectly cut off that suffix and cause duplicates. But since duplicates are not allowed, it would again fail. Another alternative could be to take a hash of the set name, but then it is more difficult to troubleshoot because the hash name is difficult to associate with the set.
My recommendation is that we add an x-validation rule to the CRD that prevents a set name that is longer than supported, and the UI would need to reject storage class names that are too long as well. A storage class with a name that long is too much of a corner case to change this, since we have had the product for multiple years and this hasn't come up previously.
— Additional comment from Subham Rai on 2024-04-12 07:04:21 UTC —
X-validator in CR resources makes sense.