-
Bug
-
Resolution: Done-Errata
-
Major
-
4.14.z
This is a clone of issue OCPBUGS-35300. The following is the description of the original issue:
—
Description of problem:
ARO cluster fails to install with disconnected networking. We see master nodes bootup hang on the service machine-config-daemon-pull.service. Logs from the service indicate it cannot reach the public IP of the image registry. In ARO, image registries need to go via a proxy. Dnsmasq is used to inject proxy DNS answers, but machine-config-daemon-pull is starting before ARO's dnsmasq.service starts.
Version-Release number of selected component (if applicable):
4.14.16
How reproducible:
Always
Steps to Reproduce:
For Fresh Install: 1. Create the required ARO vnet and subnets 2. Attach a route table to the subnets with a blackhole route 0.0.0.0/0 3. Create 4.14 ARO cluster with --apiserver-visibility=Private --ingress-visibility=Private --outbound-type=UserDefinedRouting [OR] Post Upgrade to 4.14: 1. Create a ARO 4.13 UDR. 2. ClusterUpgrade the cluster 4.13-> 4.14 , upgrade was successful 3. Create a new node (scale up), we run into the same issue.
Actual results:
For Fresh Install of 4.14: ERROR: (InternalServerError) Deployment failed. [OR] Post Upgrade to 4.14: Node doesn't come into a Ready State and Machine is stuck in Provisioned status.
Expected results:
Succeeded
Additional info:
We see in the node logs that machine-config-daemon-pull.service is unable to reach the image registry. ARO's dnsmasq was not yet started.
Previously, systemd ordering was set for ovs-configuration.service to start after (ARO's) dnsmasq.service. Perhaps that should have gone on machine-config-daemon-pull.service.
See https://issues.redhat.com/browse/OCPBUGS-25406.
- blocks
-
OCPBUGS-36550 Disconnected ARO clusters fail to add new nodes after upgrading to 4.14
- Closed
- clones
-
OCPBUGS-35300 Disconnected ARO clusters fail to add new nodes after upgrading to 4.14
- Closed
- is blocked by
-
OCPBUGS-35300 Disconnected ARO clusters fail to add new nodes after upgrading to 4.14
- Closed
- is cloned by
-
OCPBUGS-36550 Disconnected ARO clusters fail to add new nodes after upgrading to 4.14
- Closed
- links to
-
RHBA-2024:4469 OpenShift Container Platform 4.16.z bug fix update