-
Bug
-
Resolution: Cannot Reproduce
-
Undefined
-
None
-
4.16
-
None
-
Critical
-
No
-
Rejected
-
False
-
Description of problem:
After node's reboot on-prem-resolv-prepender.service fails to start that prevents node joining cluster
systemctl status on-prem-resolv-prepender.service --no-pager -l × on-prem-resolv-prepender.service - Populates resolv.conf according to on-prem IPI needs Loaded: loaded (/etc/systemd/system/on-prem-resolv-prepender.service; static) Active: failed (Result: exit-code) since Wed 2024-03-06 08:38:42 UTC; 4h 47min ago Process: 27970 ExecStart=/usr/local/bin/resolv-prepender.sh (code=exited, status=1/FAILURE) Main PID: 27970 (code=exited, status=1/FAILURE) CPU: 319ms Mar 06 08:38:42 openshift-worker-0.kni-qe-4.lab.eng.rdu2.redhat.com zealous_sinoussi[28996]: time="2024-03-06T08:38:42Z" level=debug msg="Stack of address '2620:52:0:1d0::23/128' and VIP '10.1.208.10' does not match. Skipping." Mar 06 08:38:42 openshift-worker-0.kni-qe-4.lab.eng.rdu2.redhat.com zealous_sinoussi[28996]: time="2024-03-06T08:38:42Z" level=debug msg="Checking whether address 2620:52:0:1d0::23/128 with route {Ifindex: 14 Dst: 2620:52:0:1d0::/64 Src: <nil> Gw: <nil> Flags: [] Table: 254} contains VIP 2620:52:0:1d0::10" Mar 06 08:38:42 openshift-worker-0.kni-qe-4.lab.eng.rdu2.redhat.com zealous_sinoussi[28996]: time="2024-03-06T08:38:42Z" level=debug msg="Address 2620:52:0:1d0::23/128 with route {Ifindex: 14 Dst: 2620:52:0:1d0::/64 Src: <nil> Gw: <nil> Flags: [] Table: 254} contains VIP 2620:52:0:1d0::10" Mar 06 08:38:42 openshift-worker-0.kni-qe-4.lab.eng.rdu2.redhat.com zealous_sinoussi[28996]: time="2024-03-06T08:38:42Z" level=error msg="Chosen node IP is not usable" Mar 06 08:38:42 openshift-worker-0.kni-qe-4.lab.eng.rdu2.redhat.com zealous_sinoussi[28996]: time="2024-03-06T08:38:42Z" level=fatal msg="error in node-ip show: Failed to find node IP\n" Mar 06 08:38:42 openshift-worker-0.kni-qe-4.lab.eng.rdu2.redhat.com podman[28948]: 2024-03-06 08:38:42.47423805 +0000 UTC m=+0.118854068 container died 0845031d22bc568daa7b37beded6d283ac32635c48040ca0d32d3350ed170468 (image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fce4ca051a69ef4355aa954f874e2c51a10bc9a32d05e6b0acf2e6ba8c8b4d25, name=zealous_sinoussi, maintainer=Antoni Segura Puimedon <antoni@redhat.com>, description=Retrieves Node and Cluster information for baremetal network config, io.openshift.build.commit.id=57d34650a766dfed9ff8689318505b1953979774, version=v4.16.0, name=openshift/ose-baremetal-runtimecfg-rhel9, distribution-scope=public, vcs-ref=7d0a11643e812288820fc7f28f9b4c3acd344c66, architecture=x86_64, License=GPLv2+, com.redhat.component=ose-baremetal-runtimecfg-container, io.openshift.build.source-location=https://github.com/openshift/baremetal-runtimecfg, vendor=Red Hat, Inc., io.openshift.maintainer.component=Networking / runtime-cfg, io.openshift.build.commit.url=https://github.com/openshift/baremetal-runtimecfg/commit/57d34650a766dfed9ff8689318505b1953979774, io.openshift.tags=openshift,base, io.buildah.version=1.29.0, com.redhat.license_terms=https://www.redhat.com/agreements, release=202402141940.p0.g57d3465.assembly.stream.el9, summary=Provides the latest release of the Red Hat Extended Life Base Image., io.openshift.maintainer.project=OCPBUGS, vcs-type=git, build-date=2024-02-14T20:43:27, io.k8s.description=Retrieves Node and Cluster information for baremetal network config, url=https://access.redhat.com/containers/#/registry.access.redhat.com/openshift/ose-baremetal-runtimecfg-rhel9/images/v4.16.0-202402141940.p0.g57d3465.assembly.stream.el9, io.k8s.display-name=baremetal-runtimecfg, io.openshift.expose-services=) Mar 06 08:38:42 openshift-worker-0.kni-qe-4.lab.eng.rdu2.redhat.com podman[28948]: 2024-03-06 08:38:42.38526145 +0000 UTC m=+0.029877468 image pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fce4ca051a69ef4355aa954f874e2c51a10bc9a32d05e6b0acf2e6ba8c8b4d25 Mar 06 08:38:42 openshift-worker-0.kni-qe-4.lab.eng.rdu2.redhat.com systemd[1]: on-prem-resolv-prepender.service: Main process exited, code=exited, status=1/FAILURE Mar 06 08:38:42 openshift-worker-0.kni-qe-4.lab.eng.rdu2.redhat.com systemd[1]: on-prem-resolv-prepender.service: Failed with result 'exit-code'. Mar 06 08:38:42 openshift-worker-0.kni-qe-4.lab.eng.rdu2.redhat.com systemd[1]: Failed to start Populates resolv.conf according to on-prem IPI needs.
Version-Release number of selected component (if applicable):
4.16.0-ec.3
How reproducible:
on-prem-resolv-prepender.service randomly fail to start on cluster nodes(both control-plane and worker)
Steps to Reproduce:
1. Deploy baremetal dualstack cluster 2. Configure OVN on another NIC(br-ex1) 3. Reboot node
Actual results:
on-prem-resolv-prepender.service fails to start and nodes fail to join cluster
Expected results:
on-prem-resolv-prepender.service successfully started
Additional info:
Baremetal dualstack cluster deployed with IPI