Resolution: Done-Errata
This is a clone of issue OCPBUGS-38012. The following is the description of the original issue:
Description of problem:
Customers are unable to scale-up the OCP nodes when the initial setup is done with OCP 4.8/4.9 and then upgraded to 4.15.22/4.15.23 At first customer observed that the node scale-up failed and the /etc/resolv.conf was empty on the nodes. As a workaround, customer copy/paste the resolv.conf content from a correct resolv.conf and then it continued with setting up the new node. However then they observed the rendered MachineConfig assembled with the 00-worker, and suspected that something can be wrong with the on-prem-resolv-prepender.service service definition. As a workaround, customer manually changed this service definition which helped them to scale up new nodes.
Version-Release number of selected component (if applicable):
4.15 , 4.16
How reproducible:
Steps to Reproduce:
1. Install OCP vSphere IPI cluster version 4.8 or 4.9 2. Check "on-prem-resolv-prepender.service" service definition 3. Upgrade it to 4.15.22 or 4.15.23 4. Check if the node scaling is working 5. Check "on-prem-resolv-prepender.service" service definition
Actual results:
Unable to scaleup node with default service definition. After manually making changes in the service definition , scaling is working.
Expected results:
Node sclaing should work without making any manual changes in the service definition.
Additional info:
on-prem-resolv-prepender.service content on the clusters build with 4.8 / 4.9 version and then upgraded to 4.15.22 / 4.25.23 : ~~~ [Unit] Description=Populates resolv.conf according to on-prem IPI needs # Per https://issues.redhat.com/browse/OCPBUGS-27162 there is a problem if this is started before crio-wipe After=crio-wipe.service [Service] Type=oneshot Restart=on-failure RestartSec=10 StartLimitIntervalSec=0 ExecStart=/usr/local/bin/resolv-prepender.sh EnvironmentFile=/run/resolv-prepender/env ~~~ After manually correcting the service definition as below, scaling works on 4.15.22 / 4.15.23 : ~~~ [Unit] Description=Populates resolv.conf according to on-prem IPI needs # Per https://issues.redhat.com/browse/OCPBUGS-27162 there is a problem if this is started before crio-wipe After=crio-wipe.service StartLimitIntervalSec=0 -----------> this [Service] Type=oneshot #Restart=on-failure -----------> this RestartSec=10 ExecStart=/usr/local/bin/resolv-prepender.sh EnvironmentFile=/run/resolv-prepender/env ~~~ Below is the on-prem-resolv-prepender.service on a freshly intsalled 4.15.23 where sclaing is working fine : ~~~ [Unit] Description=Populates resolv.conf according to on-prem IPI needs # Per https://issues.redhat.com/browse/OCPBUGS-27162 there is a problem if this is started before crio-wipe After=crio-wipe.service StartLimitIntervalSec=0 [Service] Type=oneshot Restart=on-failure RestartSec=10 ExecStart=/usr/local/bin/resolv-prepender.sh EnvironmentFile=/run/resolv-prepender/env ~~~ Observed this in the rendered MachineConfig which is assembled with the 00-worker
- blocks
OCPBUGS-42109 Node sclaling failed due to misconfigurations in on-prem-resolv-prepender.service in RHOCP4
- Closed
- clones
OCPBUGS-38012 Node scaling failed due to misconfigurations in on-prem-resolv-prepender.service in RHOCP4
- Closed
- is blocked by
OCPBUGS-38012 Node scaling failed due to misconfigurations in on-prem-resolv-prepender.service in RHOCP4
- Closed
- is cloned by
OCPBUGS-42109 Node sclaling failed due to misconfigurations in on-prem-resolv-prepender.service in RHOCP4
- Closed
- links to
RHBA-2024:8434 OpenShift Container Platform 4.17.z bug fix update