[OCPBUGS-43787] Missing /etc/resolv.conf file on new node - Red Hat Issue Tracker

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: 4.16
Component/s: Machine Config Operator / platform-openstack
Labels:
- Triaged

Test Coverage:

-
Regression:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

When I upscale my cluster or recreate a worker node, machine is successfully created in OpenStack but it will never become a node.

Version-Release number of selected component (if applicable): 4.16.16

How reproducible:

Steps to Reproduce:

    1. Scale up worker machineset or remove one of the existing worker machines
    2.
    3.

Actual results:

New machine is created but it will never become a node. When I ssh into the worker node VM, I see that
* pods are not being created, image pull fails
* /etc/resolv.conf is missing
* systemctl status on-prem-resolv-prepender.service shows the following

Warning: The unit file, source configuration file or drop-ins of on-prem-resolv-prepender.service changed on disk. Run 'systemctl daemon-reload' to reload units.
● on-prem-resolv-prepender.service - Populates resolv.conf according to on-prem IPI needs
   Loaded: bad-setting (Reason: Unit on-prem-resolv-prepender.service has a bad unit file setting.)
   Active: inactive (dead)
    

Content of the service file
$ cat /etc/systemd/system/on-prem-resolv-prepender.service
[Unit]
Description=Populates resolv.conf according to on-prem IPI needs
# Per https://issues.redhat.com/browse/OCPBUGS-27162 there is a problem if this is started before crio-wipe
After=crio-wipe.service
StartLimitIntervalSec=0
[Service]
Type=oneshot
Restart=on-failure
RestartSec=10
ExecStart=/usr/local/bin/resolv-prepender.sh
EnvironmentFile=/run/resolv-prepender/env

Expected results:

New nodes is created and becomes ready in reasonable time.

Additional info:

It's a cluster deployed on PSI OpenStack, originally installed as 4.9.12, then updated to multiple times up to 4.16.16

When I create /etc/resolv.conf file manually (copied from a healthy worker node), process continues fine and machine becomes a healthy node in couple of minutes

# content of /etc/resolv.conf
nameserver 192.168.2.148
nameserver 10.11.5.19
nameserver 10.2.32.1
search cicd.ospqa.com

impacts account

OCPBUGS-38012 Node scaling failed due to misconfigurations in on-prem-resolv-prepender.service in RHOCP4

Closed

Assignee:: Matthew Booth

Reporter:: Pavol Pitoňák

QA Contact:: Itshak Brown

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2024/10/24 11:43 AM

Updated:: 2024/11/19 4:40 PM

Resolved:: 2024/11/19 4:38 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide