If the network to the bootstrap VM is slow, the extract-machine-os.service can time out (after 180s). If this happens, it will be restarted but services that depend on it (like ironic) will never be started even once it succeeds. systemd added support for Restart:on-failure for Type:oneshot services, but they still don't behave the same way as other types of services.
This can be simulated in dev-scripts by doing:
sudo tc qdisc add dev ostestbm root netem rate 33Mbit
- blocks
-
OCPBUGS-41500 Slow network causes metal IPI bootstrap to fail
- Closed
- is cloned by
-
OCPBUGS-41500 Slow network causes metal IPI bootstrap to fail
- Closed
- is duplicated by
-
OCPBUGS-36853 ironic.service fails to start on bootstrap node when provisioning network is disabled
- Closed
- links to
-
RHEA-2024:6122 OpenShift Container Platform 4.18.z bug fix update