-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.17.0
-
None
This is a clone of issue OCPBUGS-50011. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-41811. The following is the description of the original issue:
—
Component Readiness has found a potential regression in the following test:
operator conditions network
Probability of significant regression: 99.42%
Sample (being evaluated) Release: 4.17
Start Time: 2024-09-04T00:00:00Z
End Time: 2024-09-11T23:59:59Z
Success Rate: 60.00%
Successes: 6
Failures: 4
Flakes: 0
Base (historical) Release: 4.16
Start Time: 2024-05-28T00:00:00Z
End Time: 2024-06-27T23:59:59Z
Success Rate: 100.00%
Successes: 22
Failures: 0
Flakes: 0
The bug is being opened against the component we see the regression for. However we see 4 out of 10 jobs have failed install with missing worker nodes.
Slack thread
Had a look at the agent-gather and found just the worker ones (as expected, since they did not join the bootstrap). From the journal the workers were able to fetch the ignition, ie ... set 09 21:19:27 worker-0 assisted-installer[2823]: time="2024-09-09T19:19:27Z" level=info msg="Getting ignition from https://192.168.111.5:22623/config/worker" ... and ready to reboot: ... set 09 21:20:31 worker-0 assisted-installer[2823]: time="2024-09-09T19:20:31Z" level=info msg="Uploading logs and reporting status before rebooting the node 2cc1856b-c4f5-4e3e-9117-128cf97e1d15 for cluster abe52fa7-9e7e-465e-a5c5-457ecb49bb70" ... But the reboot never happened, and they remaining stuck (thus not completing the joining procedure). The reason of the stuck it's not yet clear
- clones
-
OCPBUGS-50011 4.17 Failed workers reboot in HA topology prevents cluster deployment completion
-
- Closed
-
- is blocked by
-
OCPBUGS-50011 4.17 Failed workers reboot in HA topology prevents cluster deployment completion
-
- Closed
-
- links to