-
Story
-
Resolution: Unresolved
-
Normal
-
None
-
rhel-9.2.0.z, rhel-9.4.z, rhel-9.6.z
-
None
-
None
-
rhel-net-mgmt
-
None
-
False
-
False
-
-
None
-
None
-
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
-
All
-
None
Context: Assisted-Installer tool allows users to install Openshift easily on their own infrastructure (BM, VMs, on premise or in a cloud).
What were you trying to do that didn't work?
In dual stack environment using ip=auto on the kernel argument only waits on one stack to be available. This leads to issues when trying to download files (RHCOS ignition file in our case) during the initramfs, because the right stack might to fetch the file might not be configured.
What is the impact of this issue to you?
A solution that we applied in the past to workaround ip=auto, was to configure explicitly only the NIC, and the stacks that we detect during the installation process. While this solution works well for day1 installations because we have all the details to decide how the network configuration should be, the day2 process (adding a node to an existing cluster) doesn't work that well as we don't get much details, so we leave ip=auto on such nodes.
On top of that, we added support for iSCSI, that we need to explicitly configure because of the explicit configuration above (otherwise, the NIC dedicated of iSCSI volume is left unconfigured, and the volume cannot be mounted).
All this combination led us to subtle bugs, where machines are left with un-configured network stack, and stuck during boot time.
The main goal of this bug, is to review this complexity, and see if NetworkManager can help us to make things simpler.
Please provide the package NVR for which the bug is seen:
NetworkManager / nm-initrd-generator
How reproducible is this bug?:
Machine connected to a dual stack network configured with `ip=auto`, if DHCPv4 is a bit slow, then only IPv6 is configured.
Steps to reproduce
See above
Expected results
- ensure that all possible stacks are configured on a NIC (best effort, after a timeout)
- do not fail if a NIC do not get any configuration (best effort, after a timeout)
Actual results
- with ip=auto only one stack might be configured due to timings.
In thread slack , we discussed the usage of ip=dhcp,dhcp6 that would match the expected behavior, the only issue would be the default, and not configurable timeout of 20s that might be a bit long, and can impact users when rebooting their machines.
After testing, it looks like ip=dhcp,dhcp6 expects at least one stack to be configured on a NIC otherwise we go in emergency mode. In some setups, only some of the NICs are be configured with DHCP (in Oracle Cloud Infrastructure for example, the primary NIC is configured with DHCP, and secondary ones must be configured statically).