-
Story
-
Resolution: Done
-
Undefined
-
None
-
False
-
False
One of the issues with rebooting DPU and Host in the two cluster design is that we cannot guarantee that the DPU will be up before the Host tries to load the ignition file from the underlay network. If it cannot reach the ignition server (which is managed by MCO), it will use the default configure instead of the last one (even though the last one worked). Therefore the VFs cannot be created on the host, due to the absence of the
switchdev-configuration-before-nm.service. Due to no VFs, the ovnkube-node pod will fail to start on the DPU. Then the necessary flows to the tenant cluster API server cannot be loaded to br-ex. Consequently, the Machine Config Daemon (MCD) on the host cannot reach the API server after bootup, and it cannot know the host is not in the desired MachineConfig.
The current workaround is to reboot the x86 host again (assuming that the BF-2 has already bootup up fully). That way, during the next boot, it will get the correct config and MCD will be able to access the API server.