Loading...

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.12.z
Component/s: RHCOS
Labels:
- RHCOS
- cee.neXT
- etcd
- node

Activity Type:
Incidents & Support
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Critical
Regression:
No

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

Customer Impact:

Customer Escalated, Customer Facing

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Priority Data:
PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Here is the complete details what we have observed and faced:

Few days back, we have faced less IOPS issue with the azure.
We recommended to use the disk with good IOPS.
In order to achieve this we went on remote with the customer.
interestingly, we found that there were 2 disks attached to the core os, and this is only in Azure (temp disk)[1]

==============================================

We have faced another big problem while adding additional faster disk for the etcd specifically, we have faced the below issues:

We continue to face this challenge with Red Hat CoreOS, and despite extensive efforts, we haven't identified a viable solution yet. one of our respective SPTSE created KCS [2] during version 4.3, which addressed bare metal deployment. However, given the current reliance on Cloud Providers, this approach appears impractical.

We require assistance in brainstorming various options to ensure that mounts remain persistent across reboots. Utilizing /etc/fstab for CoreOS doesn't seem suitable or practical for our needs. Additionally, relying on /dev/disk/by-path and by-id values presents challenges since they differ for each machine and disk. Therefore, a single machine-config with secondary mounts wouldn't provide a comprehensive solution.

Would it be advisable to generate /etc/fstab entries using UUIDs and establish distinct machine-config-pools for each machine individually, given that this would entail hard-coded entries? While our documentation [3] offers some solutions concerning secondary disks, relying on disk names isn't reliable across reboots, resulting in instability for clients.

Furthermore, through extensive discussions across SBRs, we have delved into this matter in detail and have learned additional perspectives, summarized as follows.

For the establishment of a distinct and dedicated secondary /var partition, we've documented a procedure [4] that involves hardcoding the disk name as /dev/nvme1n1, which represents the AWS block device's absolute path. While the procedure remains the same, any changes in disk names would result in failure. This approach may encounter issues when applied to bare metal or VMware vSphere or Azure environments, where disk names typically appear as /dev/sda or /dev/sdb, potentially leading to failures.

A similar procedure [5] is outlined for bare metal UPI, which relies on by-id values. However, these values may vary for each node, adding complexity to the procedure. Example by-id values are provided.

ls -ltr /dev/disk/by-id
total 0
lrwxrwxrwx. 1 root root  9 Feb 12 19:05 wwn-0x60022480522cf2d84b3fb8c42ef578e6 -> ../../sda
lrwxrwxrwx. 1 root root  9 Feb 12 19:05 scsi-360022480522cf2d84b3fb8c42ef578e6 -> ../../sda
lrwxrwxrwx. 1 root root  9 Feb 12 19:05 scsi-14d53465420202020522cf2d84b3fa94cb8c2b8c42ef578e6 -> ../../sda
lrwxrwxrwx. 1 root root 10 Feb 12 19:05 scsi-360022480522cf2d84b3fb8c42ef578e6-part1 -> ../../sda1
lrwxrwxrwx. 1 root root 10 Feb 12 19:05 scsi-14d53465420202020522cf2d84b3fa94cb8c2b8c42ef578e6-part1 -> ../../sda1
lrwxrwxrwx. 1 root root 10 Feb 12 19:05 wwn-0x60022480522cf2d84b3fb8c42ef578e6-part1 -> ../../sda1
lrwxrwxrwx. 1 root root 10 Feb 12 19:05 wwn-0x60022480522cf2d84b3fb8c42ef578e6-part4 -> ../../sda4
lrwxrwxrwx. 1 root root 10 Feb 12 19:05 scsi-360022480522cf2d84b3fb8c42ef578e6-part4 -> ../../sda4
lrwxrwxrwx. 1 root root 10 Feb 12 19:05 scsi-14d53465420202020522cf2d84b3fa94cb8c2b8c42ef578e6-part4 -> ../../sda4
lrwxrwxrwx. 1 root root 10 Feb 12 19:05 wwn-0x60022480522cf2d84b3fb8c42ef578e6-part2 -> ../../sda2
lrwxrwxrwx. 1 root root 10 Feb 12 19:05 scsi-360022480522cf2d84b3fb8c42ef578e6-part2 -> ../../sda2
lrwxrwxrwx. 1 root root 10 Feb 12 19:05 scsi-14d53465420202020522cf2d84b3fa94cb8c2b8c42ef578e6-part2 -> ../../sda2
lrwxrwxrwx. 1 root root  9 Feb 12 19:05 ata-Virtual_CD -> ../../sr0
lrwxrwxrwx. 1 root root 10 Feb 12 19:05 wwn-0x60022480522cf2d84b3fb8c42ef578e6-part3 -> ../../sda3
lrwxrwxrwx. 1 root root 10 Feb 12 19:05 scsi-360022480522cf2d84b3fb8c42ef578e6-part3 -> ../../sda3
lrwxrwxrwx. 1 root root 10 Feb 12 19:05 scsi-14d53465420202020522cf2d84b3fa94cb8c2b8c42ef578e6-part3 -> ../../sda3
lrwxrwxrwx. 1 root root  9 Feb 12 19:40 wwn-0x600224808fe8bd31e72955aadc4cf77d -> ../../sdb
lrwxrwxrwx. 1 root root  9 Feb 12 19:40 scsi-SMsft_Virtual_Disk_8FE8BD31E729E641A58E55AADC4CF77D -> ../../sdb
lrwxrwxrwx. 1 root root  9 Feb 12 19:40 scsi-3600224808fe8bd31e72955aadc4cf77d -> ../../sdb
lrwxrwxrwx. 1 root root  9 Feb 12 19:40 scsi-14d534654202020208fe8bd31e729e641a58e55aadc4cf77d -> ../../sdb

So the butane file should have different values for each worker here since the by-id for secondary disk would differ for each worker/master. We have many clients using this[6] KCS as well and this too talks about hard-coded disk name i.e. /dev/sdb which isn't consistent. Oscar in comments section says using UUIDs but that doesn't seem feasible.

[1] https://learn.microsoft.com/en-us/azure/virtual-machines/managed-disks-overview#temporary-disk
[2] https://access.redhat.com/solutions/5023051
[3] https://docs.openshift.com/container-platform/4.12/scalability_and_performance/recommended-performance-scale-practices/recommended-etcd-practices.html#move-etcd-different-disk_recommended-etcd-practices
[4] https://docs.openshift.com/container-platform/4.14/post_installation_configuration/node-tasks.html#machine-node-custom-partition_post-install-node-tasks
[5] https://docs.openshift.com/container-platform/4.14/installing/installing_bare_metal/installing-bare-metal.html#installation-user-infra-machines-advanced_vardisk_installing-bare-metal
[6] https://access.redhat.com/solutions/4952011

Steps to Reproduce:

Yes it's 100% reproducible on Azure IPI OCP

account is impacted by

OCPBUGS-29817 Non-guaranteed persistent disk naming across reboots causing ODF pods go down

Closed

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide