-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
4.18, 4.19, 4.20
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
While installing a cluster using Agent based installer, temporary network instability is causing the installation to be stuck. After booting the nodes with discovery ISO, if the nodes fail to fetch images due to a temporary network issue, the services `agent-register-cluster` and `agent-register-infraenv.service` fail. These do restart again as per the design and fetch the images after the issue is fixed, but the dependent services `apply-host-config` and `start-cluster-installtion` do not restart as they are one shot services. This results into the network issue resolving and the images fetched, but the installation still failing.
Version-Release number of selected component (if applicable):
How reproducible:
Install a cluster using agent based installer in an unstable environment and observe the nodes after the temporary network issue has been resolved.
Steps to Reproduce:
1. Create a discovery ISO and boot the nodes
2. Create a temporary network issue which blocks the images from being fetched.
3. Solve the temporary network issue after some time and observe the nodes for the services `apply-host-config` and `start-cluster-installtion`.
Actual results:
Installation fails after a temporary network issue
Expected results:
The services `apply-host-config` and `start-cluster-installtion` should restart after a dependency failure caused by a temporary issue.
Additional info:
- The agent gather, installation debug logs and systemd logs from node0 can be found in the following drive. Link - https://drive.google.com/drive/folders/1ERConHJ-eXcQBiye-hM07U4ldEWu6XRz?usp=drive_link - Slack thread where this was discussed. Link - https://redhat-internal.slack.com/archives/C02SPBZ4GPR/p1754890273671189