Description of problem:
Here is the complete details what we have observed and faced:
- Few days back, we have faced less IOPS issue with the azure.
- We recommended to use the disk with good IOPS.
- In order to achieve this we went on remote with the customer.
- interestingly, we found that there were 2 disks attached to the core os, and this is only in Azure (temp disk)[1]
==============================================
We have faced another big problem while adding additional faster disk for the etcd specifically, we have faced the below issues:
We continue to face this challenge with Red Hat CoreOS, and despite extensive efforts, we haven't identified a viable solution yet. one of our respective SPTSE created KCS [2] during version 4.3, which addressed bare metal deployment. However, given the current reliance on Cloud Providers, this approach appears impractical.
We require assistance in brainstorming various options to ensure that mounts remain persistent across reboots. Utilizing /etc/fstab for CoreOS doesn't seem suitable or practical for our needs. Additionally, relying on /dev/disk/by-path and by-id values presents challenges since they differ for each machine and disk. Therefore, a single machine-config with secondary mounts wouldn't provide a comprehensive solution.
Would it be advisable to generate /etc/fstab entries using UUIDs and establish distinct machine-config-pools for each machine individually, given that this would entail hard-coded entries? While our documentation [3] offers some solutions concerning secondary disks, relying on disk names isn't reliable across reboots, resulting in instability for clients.
Furthermore, through extensive discussions across SBRs, we have delved into this matter in detail and have learned additional perspectives, summarized as follows.
For the establishment of a distinct and dedicated secondary /var partition, we've documented a procedure [4] that involves hardcoding the disk name as /dev/nvme1n1, which represents the AWS block device's absolute path. While the procedure remains the same, any changes in disk names would result in failure. This approach may encounter issues when applied to bare metal or VMware vSphere or Azure environments, where disk names typically appear as /dev/sda or /dev/sdb, potentially leading to failures.
A similar procedure [5] is outlined for bare metal UPI, which relies on by-id values. However, these values may vary for each node, adding complexity to the procedure. Example by-id values are provided.
ls -ltr /dev/disk/by-id total 0 lrwxrwxrwx. 1 root root 9 Feb 12 19:05 wwn-0x60022480522cf2d84b3fb8c42ef578e6 -> ../../sda lrwxrwxrwx. 1 root root 9 Feb 12 19:05 scsi-360022480522cf2d84b3fb8c42ef578e6 -> ../../sda lrwxrwxrwx. 1 root root 9 Feb 12 19:05 scsi-14d53465420202020522cf2d84b3fa94cb8c2b8c42ef578e6 -> ../../sda lrwxrwxrwx. 1 root root 10 Feb 12 19:05 scsi-360022480522cf2d84b3fb8c42ef578e6-part1 -> ../../sda1 lrwxrwxrwx. 1 root root 10 Feb 12 19:05 scsi-14d53465420202020522cf2d84b3fa94cb8c2b8c42ef578e6-part1 -> ../../sda1 lrwxrwxrwx. 1 root root 10 Feb 12 19:05 wwn-0x60022480522cf2d84b3fb8c42ef578e6-part1 -> ../../sda1 lrwxrwxrwx. 1 root root 10 Feb 12 19:05 wwn-0x60022480522cf2d84b3fb8c42ef578e6-part4 -> ../../sda4 lrwxrwxrwx. 1 root root 10 Feb 12 19:05 scsi-360022480522cf2d84b3fb8c42ef578e6-part4 -> ../../sda4 lrwxrwxrwx. 1 root root 10 Feb 12 19:05 scsi-14d53465420202020522cf2d84b3fa94cb8c2b8c42ef578e6-part4 -> ../../sda4 lrwxrwxrwx. 1 root root 10 Feb 12 19:05 wwn-0x60022480522cf2d84b3fb8c42ef578e6-part2 -> ../../sda2 lrwxrwxrwx. 1 root root 10 Feb 12 19:05 scsi-360022480522cf2d84b3fb8c42ef578e6-part2 -> ../../sda2 lrwxrwxrwx. 1 root root 10 Feb 12 19:05 scsi-14d53465420202020522cf2d84b3fa94cb8c2b8c42ef578e6-part2 -> ../../sda2 lrwxrwxrwx. 1 root root 9 Feb 12 19:05 ata-Virtual_CD -> ../../sr0 lrwxrwxrwx. 1 root root 10 Feb 12 19:05 wwn-0x60022480522cf2d84b3fb8c42ef578e6-part3 -> ../../sda3 lrwxrwxrwx. 1 root root 10 Feb 12 19:05 scsi-360022480522cf2d84b3fb8c42ef578e6-part3 -> ../../sda3 lrwxrwxrwx. 1 root root 10 Feb 12 19:05 scsi-14d53465420202020522cf2d84b3fa94cb8c2b8c42ef578e6-part3 -> ../../sda3 lrwxrwxrwx. 1 root root 9 Feb 12 19:40 wwn-0x600224808fe8bd31e72955aadc4cf77d -> ../../sdb lrwxrwxrwx. 1 root root 9 Feb 12 19:40 scsi-SMsft_Virtual_Disk_8FE8BD31E729E641A58E55AADC4CF77D -> ../../sdb lrwxrwxrwx. 1 root root 9 Feb 12 19:40 scsi-3600224808fe8bd31e72955aadc4cf77d -> ../../sdb lrwxrwxrwx. 1 root root 9 Feb 12 19:40 scsi-14d534654202020208fe8bd31e729e641a58e55aadc4cf77d -> ../../sdb
So the butane file should have different values for each worker here since the by-id for secondary disk would differ for each worker/master. We have many clients using this[6] KCS as well and this too talks about hard-coded disk name i.e. /dev/sdb which isn't consistent. Oscar in comments section says using UUIDs but that doesn't seem feasible.
[1] https://learn.microsoft.com/en-us/azure/virtual-machines/managed-disks-overview#temporary-disk
[2] https://access.redhat.com/solutions/5023051
[3] https://docs.openshift.com/container-platform/4.12/scalability_and_performance/recommended-performance-scale-practices/recommended-etcd-practices.html#move-etcd-different-disk_recommended-etcd-practices
[4] https://docs.openshift.com/container-platform/4.14/post_installation_configuration/node-tasks.html#machine-node-custom-partition_post-install-node-tasks
[5] https://docs.openshift.com/container-platform/4.14/installing/installing_bare_metal/installing-bare-metal.html#installation-user-infra-machines-advanced_vardisk_installing-bare-metal
[6] https://access.redhat.com/solutions/4952011
Steps to Reproduce:
Yes it's 100% reproducible on Azure IPI OCP
- account is impacted by
-
OCPBUGS-29817 Non-guaranteed persistent disk naming across reboots causing ODF pods go down
- Closed