-
Bug
-
Resolution: Won't Do
-
Undefined
-
None
-
4.19.0, 4.19
-
Critical
-
None
-
Rejected
-
False
-
-
Description of problem:
Deployment of OCP 4.19 at least from 4.19.0-0.nightly-2025-02-10-034243 fails due to journalctl logging bootkube.sh service in error loop: Feb 11 15:54:59 spoke.redacted.redhat.com bootkube.sh[3070774]: /usr/local/bin/bootkube.sh: line 85: oc: command not found Feb 11 15:54:59 spoke.redacted.redhat.com systemd[1]: bootkube.service: Main process exited, code=exited, status=127/n/a Feb 11 15:54:59 spoke.redacted.redhat.com systemd[1]: bootkube.service: Failed with result 'exit-code'. Feb 11 15:54:59 spoke.redacted.redhat.com systemd[1]: bootkube.service: Consumed 5.916s CPU time. This is seen in multiple environments on SNO and multi-node, and other different hub cluster deployments including 4.18.0 builds on hub
Version-Release number of selected component (if applicable):
OCP 4.19.0-0.nightly-2025-02-10-034243
How reproducible:
Always
Steps to Reproduce:
1. Start a SNO cluster deployment with Telco RAN DU profile, the specific kind of deployment probably doesn't matter 2. Observe deployment and monitor AI events and logs as well as journalctl on the target hardware 3.
Actual results:
Deployment fails with bootkube service in crashloop on SNO spoke
Expected results:
Deployment should succeed.
Additional info:
Additional logs will be added in a comment Event logs: severity "info" 29 cluster_id "32721c81-0967-470d-92ad-8316b9e2c25b" event_time "2025-02-10T19:20:01.250Z" host_id "4592dcf8-b85a-0a38-868a-e6d530de835e" infra_env_id "ced3a39a-650c-4d46-a341-02a9b0ab4d50" message "Host spoke.redacted.redhat.com: updated status from installing-in-progress to error (Host failed to install because its installation stage Waiting for bootkube took longer than expected 1h0m0s)" name "host_status_updated" severity "error" 30 cluster_id "32721c81-0967-470d-92ad-8316b9e2c25b" event_time "2025-02-10T19:20:09.238Z" message "Updated status of the cluster to error" name "cluster_status_updated" severity "info" 31 cluster_id "32721c81-0967-470d-92ad-8316b9e2c25b" event_time "2025-02-10T19:20:09.241Z" message "Failed installing cluster. Reason: cluster has hosts in error" name "cluster_installation_failed" severity "critical" 32 cluster_id "32721c81-0967-470d-92ad-8316b9e2c25b" event_time "2025-02-10T19:20:35.819Z" host_id "4592dcf8-b85a-0a38-868a-e6d530de835e" infra_env_id "ced3a39a-650c-4d46-a341-02a9b0ab4d50" message "Uploaded logs for host spoke.redacted.redhat.com cluster 32721c81-0967-470d-92ad-8316b9e2c25b" name "host_logs_uploaded" severity "info"
- relates to
-
ACM-18277 Failed to install 4.19 spoke with "mkdirat: Read-only" error from node-image-pull.sh when fetching release image
-
- Closed
-