Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-11052

Static IPv6 LACP bonding is randomly failing in RHCOS 413.92

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • None
    • 4.14
    • RHCOS
    • Important
    • No
    • 3
    • Sprint 234 - Team OSInt
    • 1
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Using RHCOS 413.92, our previous iPXE IPv6 static bonding used for single IPv6 stack and IPv4v6 dual-stack are no longer working as expected.
      
      Example of the iPXE configuration:
      
      ~~~
      #!ipxe
      
      kernel http://OBFUSCATED:8000/rhcos/images/rhcos-413.92.202303281804-0-live-kernel-x86_64 initrd=main bond=bond0:enp1s0f0np0,enp1s0f1np1:mode=802.3ad,lacp_rate=0,miimon=100,updelay=200,downdelay=200 ip=[2604:1380:4642:7e00::27]::[2604:1380:4642:7e00::26]:127:master-00.pamoedo-rhcos92b.qe.devcluster.openshift.com:bond0:none nameserver=[2001:4860:4860::8888] nameserver=[2001:4860:4860::8844] console=tty0 console=ttyS1,115200n8 coreos.live.rootfs_url=http://[OBFUSCATED]:8000/rhcos/images/rhcos-413.92.202303281804-0-live-rootfs.x86_64.img ignition.config.url=http://[OBFUSCATED]:8000/rhcos/ignitions/pamoedo-rhcos92b/master-console-hook.ign ignition.firstboot ignition.platform.id=metal
      initrd --name main http://OBFUSCATED:8000/rhcos/images/rhcos-413.92.202303281804-0-live-initramfs.x86_64.img
      boot
      ~~~
      

      Version-Release number of selected component (if applicable):

      4.13.0-0.nightly-2023-04-01-062001
      LACP (802.3ad) bonding

      How reproducible:

      Always, but not all nodes at the same time, it varies a lot depending on the NIC vendor and there is not a clear pattern between NIC drivers and/or vendors, it looks more related with the bonding kernel driver and some kind of instability or race condition.

      Steps to Reproduce:

      1. Deploy single/dual-stack OCP via iPXE with static LACP bonding set via kargs
      2. Use custom ignition procedure (https://coreos.github.io/coreos-installer/customizing-install/#custom-coreos-installer-invocation) to retain the same kargs.
      3.
      

      Actual results:

      The instances are able to boot properly with the iPXE bonding configuration and gather the custom ignition, after that, some of them (mostly the Intel cards), lost the connectivity and are unable to gather the second ignition file with the proper master/worker profile.

      Expected results:

      Successful bonding configuration after the initial boot and across reboots as it was working in RHCOS 413.8x.

      Additional info:

      - Related also with OCPBUGS-10787 (RHCOS 9.2 NIC renaming)
      

              rhn-gps-dmabe Dusty Mabe
              rhn-support-pamoedom Pedro Jose Amoedo Martinez
              Michael Nguyen Michael Nguyen
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

                Created:
                Updated:
                Resolved: