-
Bug
-
Resolution: Done
-
Undefined
-
4.13, 4.12, 4.14, 4.15, 4.16, 4.17, 4.18
-
None
-
Quality / Stability / Reliability
-
False
-
-
5
-
Moderate
-
None
-
None
-
None
-
None
-
OSDOCS Sprint 270
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
nodeip-configuration.service is not configured to run after nmstate service. So, is to be configured via nmstate service (e.g. as per https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/installing_on_bare_metal/installer-provisioned-infrastructure#ipi-install-establishing-communication-between-subnets_ipi-install-installation-workflow). This can be problematic in bond scenarios, where the following can happen: - Node gets some networking configuration via networkmanager before nmstate can setup the connections properly. Because of that, some interface temporarily has a wrong IP, namely IP1. - nodeip-configuration service runs and it wrongly deems IP1 to be the right node IP - Then nmstate runs and setups the right networking. This means that, for example, we have a bond whose IP is IP2. IP2 is the right IP for the node, not IP1. - However, as nodeip-configuration service ran before nmstate and stored the wrong IP1 in /run/nodeip-configuration/primary-ip , wait-for-primary-ip service blocks forever trying to use the wrong IP1 instead of the right IP2. Possible workarounds: - Reboot the node - Write the right IP in /run/nodeip-configuration/primary-ip and manually restart wait-for-primary-ip service Proposed solution: Make nodeip-configuration.service to run after nmstate service to ensure that nmstate config is applied.
Version-Release number of selected component (if applicable):
4.18.7
How reproducible:
Often (race condition) as long as steps to reproduce are met.
Steps to Reproduce:
1. Start with a cluster that has a bond interface defined via host-level nmstate in machine-config, like as per https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/installing_on_bare_metal/installer-provisioned-infrastructure#ipi-install-establishing-communication-between-subnets_ipi-install-installation-workflow . The environment uses DHCP and the DHCP server might assign IPs to the individual interfaces that are part of the bond before the bond is configured via nmstate. 2. Add a new node 3.
Actual results:
Quite often, nodes end up stuck forever in wait-for-primary-ip service because nodeip-configuration ran too early, so it wrote one of the wrong IPs that the node interfaces temporarily had.
Expected results:
nodeip-configuration to run after nmstate service, so that we are 100% sure that it runs with the right definitive network configuration.
Additional info:
I found this while doing an internal reproducer, but I wouldn't be surprised if some production user eventually finds this too.
- clones
-
OCPBUGS-54711 nodeip-configuration.service must run after nmstate
-
- Closed
-
- links to