Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-50585

OCP 4.19 failing to deploy due to bootkube.sh error, missing oc command

XMLWordPrintable

    • Critical
    • None
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      
      Deployment of OCP 4.19 at least from 4.19.0-0.nightly-2025-02-10-034243 fails
      due to journalctl logging bootkube.sh service in error loop:
      
      Feb 11 15:54:59 spoke.redacted.redhat.com bootkube.sh[3070774]: /usr/local/bin/bootkube.sh: line 85: oc: command not found
      Feb 11 15:54:59 spoke.redacted.redhat.com systemd[1]: bootkube.service: Main process exited, code=exited, status=127/n/a
      Feb 11 15:54:59 spoke.redacted.redhat.com systemd[1]: bootkube.service: Failed with result 'exit-code'.
      Feb 11 15:54:59 spoke.redacted.redhat.com systemd[1]: bootkube.service: Consumed 5.916s CPU time.
      
      This is seen in multiple environments on SNO and multi-node, and other different hub cluster deployments including 4.18.0 builds on hub
      
          

      Version-Release number of selected component (if applicable):

          OCP 4.19.0-0.nightly-2025-02-10-034243
      
          

      How reproducible:

          Always
          

      Steps to Reproduce:

          1. Start a SNO cluster deployment with Telco RAN DU profile, the specific
             kind of deployment probably doesn't matter
          2. Observe deployment and monitor AI events and logs as well as journalctl
             on the target hardware
          3.
          

      Actual results:

      Deployment fails with bootkube service in crashloop on SNO spoke
          

      Expected results:

      Deployment should succeed.
          

      Additional info:

      
              Additional logs will be added in a comment
      
              Event logs:
      
              severity        "info"
              29        
              cluster_id        "32721c81-0967-470d-92ad-8316b9e2c25b"
              event_time        "2025-02-10T19:20:01.250Z"
              host_id        "4592dcf8-b85a-0a38-868a-e6d530de835e"
              infra_env_id        "ced3a39a-650c-4d46-a341-02a9b0ab4d50"
              message        "Host spoke.redacted.redhat.com: updated status from installing-in-progress to error (Host failed to install because its installation stage Waiting for bootkube took longer than expected 1h0m0s)"
              name        "host_status_updated"
              severity        "error"
              30        
              cluster_id        "32721c81-0967-470d-92ad-8316b9e2c25b"
              event_time        "2025-02-10T19:20:09.238Z"
              message        "Updated status of the cluster to error"
              name        "cluster_status_updated"
              severity        "info"
              31        
              cluster_id        "32721c81-0967-470d-92ad-8316b9e2c25b"
              event_time        "2025-02-10T19:20:09.241Z"
              message        "Failed installing cluster. Reason: cluster has hosts in error"
              name        "cluster_installation_failed"
              severity        "critical"
              32        
              cluster_id        "32721c81-0967-470d-92ad-8316b9e2c25b"
              event_time        "2025-02-10T19:20:35.819Z"
              host_id        "4592dcf8-b85a-0a38-868a-e6d530de835e"
              infra_env_id        "ced3a39a-650c-4d46-a341-02a9b0ab4d50"
              message        "Uploaded logs for host spoke.redacted.redhat.com cluster 32721c81-0967-470d-92ad-8316b9e2c25b"
              name        "host_logs_uploaded"
              severity        "info"
          

              ncarboni@redhat.com Nick Carboni
              rhn-support-dgonyier Dwaine Gonyier
              Michael Burman Michael Burman
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: