Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-30318

on-prem-resolv-prepender.service fails after reboot - error in node-ip show: Failed to find node IP

XMLWordPrintable

    • Critical
    • No
    • Rejected
    • False

      Description of problem:

      After node's reboot on-prem-resolv-prepender.service fails to start that prevents node joining  cluster
          
      systemctl status on-prem-resolv-prepender.service --no-pager -l
      × on-prem-resolv-prepender.service - Populates resolv.conf according to on-prem IPI needs
           Loaded: loaded (/etc/systemd/system/on-prem-resolv-prepender.service; static)
           Active: failed (Result: exit-code) since Wed 2024-03-06 08:38:42 UTC; 4h 47min ago
          Process: 27970 ExecStart=/usr/local/bin/resolv-prepender.sh (code=exited, status=1/FAILURE)
         Main PID: 27970 (code=exited, status=1/FAILURE)
              CPU: 319ms
      
      Mar 06 08:38:42 openshift-worker-0.kni-qe-4.lab.eng.rdu2.redhat.com zealous_sinoussi[28996]: time="2024-03-06T08:38:42Z" level=debug msg="Stack of address '2620:52:0:1d0::23/128' and VIP '10.1.208.10' does not match. Skipping."
      Mar 06 08:38:42 openshift-worker-0.kni-qe-4.lab.eng.rdu2.redhat.com zealous_sinoussi[28996]: time="2024-03-06T08:38:42Z" level=debug msg="Checking whether address 2620:52:0:1d0::23/128 with route {Ifindex: 14 Dst: 2620:52:0:1d0::/64 Src: <nil> Gw: <nil> Flags: [] Table: 254} contains VIP 2620:52:0:1d0::10"
      Mar 06 08:38:42 openshift-worker-0.kni-qe-4.lab.eng.rdu2.redhat.com zealous_sinoussi[28996]: time="2024-03-06T08:38:42Z" level=debug msg="Address 2620:52:0:1d0::23/128 with route {Ifindex: 14 Dst: 2620:52:0:1d0::/64 Src: <nil> Gw: <nil> Flags: [] Table: 254} contains VIP 2620:52:0:1d0::10"
      Mar 06 08:38:42 openshift-worker-0.kni-qe-4.lab.eng.rdu2.redhat.com zealous_sinoussi[28996]: time="2024-03-06T08:38:42Z" level=error msg="Chosen node IP is not usable"
      Mar 06 08:38:42 openshift-worker-0.kni-qe-4.lab.eng.rdu2.redhat.com zealous_sinoussi[28996]: time="2024-03-06T08:38:42Z" level=fatal msg="error in node-ip show: Failed to find node IP\n"
      Mar 06 08:38:42 openshift-worker-0.kni-qe-4.lab.eng.rdu2.redhat.com podman[28948]: 2024-03-06 08:38:42.47423805 +0000 UTC m=+0.118854068 container died 0845031d22bc568daa7b37beded6d283ac32635c48040ca0d32d3350ed170468 (image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fce4ca051a69ef4355aa954f874e2c51a10bc9a32d05e6b0acf2e6ba8c8b4d25, name=zealous_sinoussi, maintainer=Antoni Segura Puimedon <antoni@redhat.com>, description=Retrieves Node and Cluster information for baremetal network config, io.openshift.build.commit.id=57d34650a766dfed9ff8689318505b1953979774, version=v4.16.0, name=openshift/ose-baremetal-runtimecfg-rhel9, distribution-scope=public, vcs-ref=7d0a11643e812288820fc7f28f9b4c3acd344c66, architecture=x86_64, License=GPLv2+, com.redhat.component=ose-baremetal-runtimecfg-container, io.openshift.build.source-location=https://github.com/openshift/baremetal-runtimecfg, vendor=Red Hat, Inc., io.openshift.maintainer.component=Networking / runtime-cfg, io.openshift.build.commit.url=https://github.com/openshift/baremetal-runtimecfg/commit/57d34650a766dfed9ff8689318505b1953979774, io.openshift.tags=openshift,base, io.buildah.version=1.29.0, com.redhat.license_terms=https://www.redhat.com/agreements, release=202402141940.p0.g57d3465.assembly.stream.el9, summary=Provides the latest release of the Red Hat Extended Life Base Image., io.openshift.maintainer.project=OCPBUGS, vcs-type=git, build-date=2024-02-14T20:43:27, io.k8s.description=Retrieves Node and Cluster information for baremetal network config, url=https://access.redhat.com/containers/#/registry.access.redhat.com/openshift/ose-baremetal-runtimecfg-rhel9/images/v4.16.0-202402141940.p0.g57d3465.assembly.stream.el9, io.k8s.display-name=baremetal-runtimecfg, io.openshift.expose-services=)
      Mar 06 08:38:42 openshift-worker-0.kni-qe-4.lab.eng.rdu2.redhat.com podman[28948]: 2024-03-06 08:38:42.38526145 +0000 UTC m=+0.029877468 image pull  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fce4ca051a69ef4355aa954f874e2c51a10bc9a32d05e6b0acf2e6ba8c8b4d25
      Mar 06 08:38:42 openshift-worker-0.kni-qe-4.lab.eng.rdu2.redhat.com systemd[1]: on-prem-resolv-prepender.service: Main process exited, code=exited, status=1/FAILURE
      Mar 06 08:38:42 openshift-worker-0.kni-qe-4.lab.eng.rdu2.redhat.com systemd[1]: on-prem-resolv-prepender.service: Failed with result 'exit-code'.
      Mar 06 08:38:42 openshift-worker-0.kni-qe-4.lab.eng.rdu2.redhat.com systemd[1]: Failed to start Populates resolv.conf according to on-prem IPI needs.
      

      Version-Release number of selected component (if applicable):

      4.16.0-ec.3
          

      How reproducible:

      on-prem-resolv-prepender.service randomly fail to start on cluster nodes(both control-plane and worker)
          

      Steps to Reproduce:

          1.  Deploy baremetal dualstack cluster
          2. Configure OVN on another NIC(br-ex1)
          3. Reboot node
          

      Actual results:

      on-prem-resolv-prepender.service fails to start and nodes fail to join cluster
          

      Expected results:

      on-prem-resolv-prepender.service successfully started
          

      Additional info:

      Baremetal dualstack cluster deployed with IPI
          

            bnemec@redhat.com Benjamin Nemec
            yprokule@redhat.com Yurii Prokulevych
            Zhanqi Zhao Zhanqi Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: