-
Feature
-
Resolution: Unresolved
-
Critical
-
None
-
None
-
Quality / Stability / Reliability
-
None
-
0% To Do, 100% In Progress, 0% Done
-
False
-
-
False
-
M
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
Feature Overview (aka. Goal Summary)
Make sure LSO checks /dev symlink before upgrading to the OCP version that introduces RHCOS 10 support.
Goals (aka. expected user outcomes)
LSO needs some way to rework symlinks for existing PV's. Our current proposal is to introduce a change in OCP 4.21 that allows LSO to choose a better symlink that points to the same device BEFORE the cluster is upgraded to OCP 4.22 (or version that introduces RHCOS 10).
There should be an alert on affected clusters notifying the admin what action they should take to resolve the issue before upgrading.
See design notes for more details.
Requirements (aka. Acceptance Criteria):
- Detect symlink that can break with RHCOS
- Alert if so
- Provide actionable steps
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
| Deployment considerations | List applicable specific needs (N/A = not applicable) |
| Self-managed, managed, or both | both |
| Classic (standalone cluster) | yes |
| Hosted control planes | yes |
| Multi node, Compact (three node), or Single node (SNO), or all | all that use LSO |
| Connected / Restricted Network | both |
| Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | any that use LSO |
| Operator compatibility | LSO |
| Backport needed (list applicable versions) | no |
| UI need (e.g. OpenShift Console, dynamic plugin, OCM) | no |
| Other (please specify) | LSO only |
Use Cases (Optional):
As an admin i want to be notified of any PV that is using a symlink that could break after I upgrade to an OCP version that uses RHCOS 10. I should also be informed on what steps I should follow to prevent the issue proactively.
Questions to Answer (Optional):
first OCP version that introduces RHCOS 10
Out of Scope
Non LSO environments.
Upgraded clusters only, not applicable to greenfield clusters (day 1 install on an OCP version that uses RHCOS 10)
Background
RHEL docs state that "Device names managed by udev in /dev/disk/ can change between major releases, requiring link updates." and we have a specific case that became apparent in OCPBUGS-61988 that we need to mitigate for RHEL 10 (OCP 4.22).
sg3_utils 1.48 in RHEL 10 disables a udev rule that creates /dev/disk/by-id/scsi-0NVME_* symlinks on RHEL 9.x. This udev rule is problematic for some customers (i.e. the support cases attached to OCPBUGS-61988) but it is also problematic for others who may already be using those symlinks successfully today because upgrading to RHEL 10 will cause those /dev/disk/by-id symlinks to disappear.
Customer Considerations
It's important to land this feature at least one OCP version before RHCOS 10 is officially supported.
Without this, data may be unavailable after upgrade for affected clusters, requiring support escalations to manually rework the symlinks after upgrade. Even then, it may not be obvious how to reconstruct the symlinks after upgrade, since the original ones are gone by that point.
Documentation Considerations
TBC after final implementation but we should have a section in the LSO doc that guides the customers on how to react against potential alerts i.e change the symlink.
Interoperability Considerations
Any OCP version that is using LSO