Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-53289

OCP 4.19 failing to deploy due to bootkube.sh error, as services node-image-overlay at node layer are in inactive state

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.19.0
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • Proposed
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Deployment of OCP 4.19 cluster using Assisted installer is failing with "bootkube.sh: line 86: oc: command not found" error, when dug deeper observed in the node layer the services  node-image-overlay and node-image-pull are in inactive state.

      Version-Release number of selected component (if applicable):

      Red Hat Enterprise Linux CoreOS 9.6.20250121-0   

      How reproducible:

      Always

      Steps to Reproduce:

      1. Start Assisted cluster deployment using this repo (https://github.com/cs-zhang/ocp4-ai-powervm.git) and the following var file  
      
      # cat vars.yaml
      ---
      # disk: /dev/sda
      helper:
        name: "helper"
        ipaddr: "9.114.96.246"
        #networkifacename: "env34"
      dns:
        domain: "ai.qa"
        clusterid: "rdr-suraj-ai-dry4"
        forwarder1: "9.9.9.9"
        forwarder2: "8.8.4.4"
      dhcp:
        router: "9.114.96.1"
        netmask: "255.255.252.0"
        subnet: "9.114.96.0/22"
      masters:
        - name: "master-1"
          ipaddr: "9.114.97.31"
          macaddr: "fa:3b:2d:34:88:20"
          pvmcec: C340F2U01-ZZ
          pvmlpar: rdr-suraj-abi-dced8481-00015c34
          disk: /dev/sda
        - name: "master-2"
          ipaddr: "9.114.97.25"
          macaddr: "fa:0f:b4:68:25:20"
          pvmcec: C340F2U01-ZZ
          pvmlpar: rdr-suraj-abi-457dbb21-00015c37
          disk: /dev/sda
        - name: "master-3"
          ipaddr: "9.114.96.249"
          macaddr: "fa:0e:52:d3:f2:20"
          pvmcec: C340F2U01-ZZ
          pvmlpar: rdr-suraj-abi-6c66908c-00015c3a
          disk: /dev/sda
      workers:
        - name: "worker-1"
          ipaddr: "9.114.97.224"
          macaddr: "fa:db:f4:ec:b1:20"
          pvmcec: C340F2U01-ZZ
          pvmlpar: rdr-suraj-abi-e965cc23-00015c3d
          disk: /dev/sda
        - name: "worker-2"
          ipaddr: "9.114.97.229"
          macaddr: "fa:dd:d0:b5:b9:20"
          pvmcec: C340F2U01-ZZ
          pvmlpar: rdr-suraj-abi-552e3495-00015c40
          disk: /dev/sda
        - name: "worker-3"
          ipaddr: "9.114.97.214"
          macaddr: "fa:f0:63:c7:9c:20"
          pvmcec: C340F2U01-ZZ
          pvmlpar: rdr-suraj-abi-2c35b112-00015c43
          disk: /dev/sda
      ########################
      force_ocp_download: true
      ######################
      # URL path to RHCOS download site
      rhcos_arch: "ppc64le"
      rhcos_base_url: "https://mirror.openshift.com/pub/openshift-v4/{{ rhcos_arch }}/dependencies/rhcos"
      rhcos_rhcos_base: "4.18"
      rhcos_rhcos_tag: "4.18.1"
      rhcos_iso: "{{ rhcos_base_url}}/{{ rhcos_rhcos_base }}/{{ rhcos_rhcos_tag }}/rhcos-live.{{ rhcos_arch }}.iso"
      rhcos_rootfs: "{{ rhcos_base_url}}/{{ rhcos_rhcos_base }}/{{ rhcos_rhcos_tag }}/rhcos-live-rootfs.{{ rhcos_arch }}.img"
      rhcos_initramfs: "{{ rhcos_base_url}}/{{ rhcos_rhcos_base }}/{{ rhcos_rhcos_tag }}/rhcos-live-initramfs.{{ rhcos_arch }}.img"
      rhcos_kernel: "{{ rhcos_base_url}}/{{ rhcos_rhcos_base }}/{{ rhcos_rhcos_tag }}/rhcos-live-kernel-{{ rhcos_arch }}"
      ocp_client_arch: "ppc64le"
      ocp_base_url: "https://mirror.openshift.com/pub/openshift-v4/multi/clients"
      ocp_client_base: "ocp-dev-preview"
      ocp_client_tag: "4.19.0-ec.3"
      ocp_client: "{{ ocp_base_url}}/{{ ocp_client_base }}/{{ ocp_client_tag }}/{{ ocp_client_arch }}/openshift-client-linux.tar.gz"
      ocp_installer: "{{ ocp_base_url}}/{{ ocp_client_base }}/{{ ocp_client_tag }}/{{ ocp_client_arch }}/openshift-install-linux.tar.gz"
      pvm_hmc: hscroot@9.114.195.140
      install_type: assisted
      assisted_url: "https://api.openshift.com/api/assisted-install/v2"
      assisted_token: ""
      assisted_ocp_version: "4.19.0-ec.3-multi"
      assisted_rhcos_version: "4.19.0-ec.3"
      pull_secret: '{{ lookup("file", "~/.openshift/pull-secret") | from_json | to_json }}'
      public_ssh_key: "{{ lookup('file', '~/.ssh/id_rsa.pub') }}"
      # need to use absolute path for workdir
      workdir: "/root/ocp4-{{ install_type }}"
      log_level: info  
      
      2. Observe deployment and monitor AI events and logs as well as journalctl on the target hardware
      
      Note: when used rhcos_rhcos_tag as 4.18.0-rc.2 the deployment is successful      

      Actual results:

      Deployment fails with bootkube service failing with "oc command not found"  error  

      Expected results:

      Deployment should succeed.

      Additional info:

      [core@master-2 ~]$ journalctl -b -f -u release-image.service -u bootkube.service
      Mar 19 06:45:40 master-2 podman[46748]: 2025-03-19 06:45:40.283028622 +0000 UTC m=+0.065964633 container start f4d198fff7d75b665adfb7054fb3499d70ed21bcc5f05564c756ef4024d47e9c (image=quay.io/openshift-release-dev/ocp-release@sha256:fb8754ad482b4932229d96530c848c84b14a7bdba6f47e739121151f977a5ae8, name=youthful_colden, io.openshift.release.base-image-digest=sha256:2d4dcf5920f9e55cf6276d07b2898d4c2c6c5cc7071b37abcb4376e09777a21d, io.openshift.release=4.19.0-ec.3)
      Mar 19 06:45:40 master-2 podman[46748]: 2025-03-19 06:45:40.283785915 +0000 UTC m=+0.066721964 container attach f4d198fff7d75b665adfb7054fb3499d70ed21bcc5f05564c756ef4024d47e9c (image=quay.io/openshift-release-dev/ocp-release@sha256:fb8754ad482b4932229d96530c848c84b14a7bdba6f47e739121151f977a5ae8, name=youthful_colden, io.openshift.release.base-image-digest=sha256:2d4dcf5920f9e55cf6276d07b2898d4c2c6c5cc7071b37abcb4376e09777a21d, io.openshift.release=4.19.0-ec.3)
      Mar 19 06:45:40 master-2 youthful_colden[46777]: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:23fbfe9e7d08ef8d2d3af6b4725702e649d07066e54e99438a9253ab85e49c2a
      Mar 19 06:45:40 master-2 podman[46748]: 2025-03-19 06:45:40.325459394 +0000 UTC m=+0.108395446 container died f4d198fff7d75b665adfb7054fb3499d70ed21bcc5f05564c756ef4024d47e9c (image=quay.io/openshift-release-dev/ocp-release@sha256:fb8754ad482b4932229d96530c848c84b14a7bdba6f47e739121151f977a5ae8, name=youthful_colden, io.openshift.release=4.19.0-ec.3, io.openshift.release.base-image-digest=sha256:2d4dcf5920f9e55cf6276d07b2898d4c2c6c5cc7071b37abcb4376e09777a21d)
      Mar 19 06:45:40 master-2 podman[46748]: 2025-03-19 06:45:40.241401821 +0000 UTC m=+0.024337833 image pull b299fe51fb2e157ab7d8152b94b4a303ae422fc8e90cb9a81e67c06f5035a56d quay.io/openshift-release-dev/ocp-release@sha256:fb8754ad482b4932229d96530c848c84b14a7bdba6f47e739121151f977a5ae8
      Mar 19 06:45:40 master-2 podman[46748]: 2025-03-19 06:45:40.338256371 +0000 UTC m=+0.121192397 container remove f4d198fff7d75b665adfb7054fb3499d70ed21bcc5f05564c756ef4024d47e9c (image=quay.io/openshift-release-dev/ocp-release@sha256:fb8754ad482b4932229d96530c848c84b14a7bdba6f47e739121151f977a5ae8, name=youthful_colden, io.openshift.release=4.19.0-ec.3, io.openshift.release.base-image-digest=sha256:2d4dcf5920f9e55cf6276d07b2898d4c2c6c5cc7071b37abcb4376e09777a21d)
      Mar 19 06:45:40 master-2 bootkube.sh[46815]: /usr/local/bin/bootkube.sh: line 86: oc: command not found
      Mar 19 06:45:40 master-2 systemd[1]: bootkube.service: Main process exited, code=exited, status=127/n/a
      Mar 19 06:45:40 master-2 systemd[1]: bootkube.service: Failed with result 'exit-code'.
      Mar 19 06:45:40 master-2 systemd[1]: bootkube.service: Consumed 2.830s CPU time.
      Mar 19 06:45:45 master-2 systemd[1]: bootkube.service: Scheduled restart job, restart counter is at 33.
      Mar 19 06:45:45 master-2 systemd[1]: Stopped Bootstrap a Kubernetes cluster.
      Mar 19 06:45:45 master-2 systemd[1]: bootkube.service: Consumed 2.830s CPU time.
      
      
      
      [core@master-2 ~]$ sudo systemctl status node-image-overlay
      ○ node-image-overlay.service - Node Image Overlay
           Loaded: loaded (/etc/systemd/system/node-image-overlay.service; static)
           Active: inactive (dead)
      
      
      [core@master-2 ~]$ sudo systemctl status node-image-pull
      ○ node-image-pull.service - Node Image Pull
           Loaded: loaded (/etc/systemd/system/node-image-pull.service; static)
           Active: inactive (dead)
      
      
      [core@master-2 ~]$ cat /etc/os-release
      NAME="Red Hat Enterprise Linux CoreOS"
      VERSION="9.6.20250121-0 (Plow)"
      ID="rhel"
      ID_LIKE="fedora"
      VERSION_ID="9.6"
      PLATFORM_ID="platform:el9"
      PRETTY_NAME="Red Hat Enterprise Linux CoreOS 9.6.20250121-0 (Plow)"
      ANSI_COLOR="0;31"
      LOGO="fedora-logo-icon"
      CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos"
      HOME_URL="https://www.redhat.com/"
      DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9"
      BUG_REPORT_URL="https://issues.redhat.com/"
      REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9"
      REDHAT_BUGZILLA_PRODUCT_VERSION=9.6
      REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
      REDHAT_SUPPORT_PRODUCT_VERSION="9.6 Beta"
      OSTREE_VERSION='9.6.20250121-0'
      VARIANT=CoreOS
      VARIANT_ID=coreos  

              lgamliel liat gamliel
              sgudaji1 Suraj Gudaji (Inactive)
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: