-
Bug
-
Resolution: Done-Errata
-
Undefined
-
None
-
4.14
-
Important
-
No
-
3
-
Sprint 234 - Team OSInt
-
1
-
Proposed
-
False
-
Description of problem:
Using RHCOS 413.92, our previous iPXE IPv6 static bonding used for single IPv6 stack and IPv4v6 dual-stack are no longer working as expected. Example of the iPXE configuration: ~~~ #!ipxe kernel http://OBFUSCATED:8000/rhcos/images/rhcos-413.92.202303281804-0-live-kernel-x86_64 initrd=main bond=bond0:enp1s0f0np0,enp1s0f1np1:mode=802.3ad,lacp_rate=0,miimon=100,updelay=200,downdelay=200 ip=[2604:1380:4642:7e00::27]::[2604:1380:4642:7e00::26]:127:master-00.pamoedo-rhcos92b.qe.devcluster.openshift.com:bond0:none nameserver=[2001:4860:4860::8888] nameserver=[2001:4860:4860::8844] console=tty0 console=ttyS1,115200n8 coreos.live.rootfs_url=http://[OBFUSCATED]:8000/rhcos/images/rhcos-413.92.202303281804-0-live-rootfs.x86_64.img ignition.config.url=http://[OBFUSCATED]:8000/rhcos/ignitions/pamoedo-rhcos92b/master-console-hook.ign ignition.firstboot ignition.platform.id=metal initrd --name main http://OBFUSCATED:8000/rhcos/images/rhcos-413.92.202303281804-0-live-initramfs.x86_64.img boot ~~~
Version-Release number of selected component (if applicable):
4.13.0-0.nightly-2023-04-01-062001 LACP (802.3ad) bonding
How reproducible:
Always, but not all nodes at the same time, it varies a lot depending on the NIC vendor and there is not a clear pattern between NIC drivers and/or vendors, it looks more related with the bonding kernel driver and some kind of instability or race condition.
Steps to Reproduce:
1. Deploy single/dual-stack OCP via iPXE with static LACP bonding set via kargs 2. Use custom ignition procedure (https://coreos.github.io/coreos-installer/customizing-install/#custom-coreos-installer-invocation) to retain the same kargs. 3.
Actual results:
The instances are able to boot properly with the iPXE bonding configuration and gather the custom ignition, after that, some of them (mostly the Intel cards), lost the connectivity and are unable to gather the second ignition file with the proper master/worker profile.
Expected results:
Successful bonding configuration after the initial boot and across reboots as it was working in RHCOS 413.8x.
Additional info:
- Related also with OCPBUGS-10787 (RHCOS 9.2 NIC renaming)
- blocks
-
OCPBUGS-11657 [4.13] Static IPv6 LACP bonding is randomly failing in RHCOS 413.92
- Closed
- is cloned by
-
OCPBUGS-11657 [4.13] Static IPv6 LACP bonding is randomly failing in RHCOS 413.92
- Closed
- links to
-
RHEA-2023:5006 rpm