Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-77551

node-image-pull.service is failing to fetch release image, resulting installation failure

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • Yes
    • ppc64le
    • None
    • None
    • In Progress
    • Bug Fix
    • The tmpfs volume used to store the ostree image on the bootstrap node is too small to contain the whole image for the ppc64le architecture. The tmpfs size has been increased to support the architecture.
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-70168. The following is the description of the original issue:

      Customer is trying to deploy OCP 4.20.8[specifically] on Power using UPI and network boot since a long time now. Until recently 4.20 was working fine, but apparently since 4.20.8 the deployment fails all the time.
      We can see 4.20.6 was still fine. We do not see a 4.20.7 version that was deployed.
      Cu tried almost 8 times.

      ~~~
      cmd:

      • /root/install/openshift-install
      • wait-for
      • bootstrap-complete
      • --dir
      • /root/install
      • --log-level
      • debug
        delta: '0:20:11.243963'
        end: '2025-12-16 18:38:35.347608'
        msg: non-zero return code
        rc: 5
        start: '2025-12-16 18:18:24.103645'
        stderr: |-
        level=debug msg=OpenShift Installer 4.20.8
        level=debug msg=Built from commit cc82f30cd640577297f66b5df80f0e08c55fd3fa
        level=info msg=Waiting up to 20m0s (until 6:38PM EST) for the Kubernetes API at https://api.p1313.cecc.ihost.com:6443...
        level=debug msg=Loading Agent Config...
        level=debug msg=Still waiting for the Kubernetes API: Get "https://api.p1313.cecc.ihost.com:6443/version": EOF
        level=debug msg=Still waiting for the Kubernetes API: Get "https://api.p1313.cecc.ihost.com:6443/version": EOF
        level=debug msg=Still waiting for the Kubernetes API: Get "https://api.p1313.cecc.ihost.com:6443/version": EOF
        level=debug msg=Still waiting for the Kubernetes API: Get "https://api.p1313.cecc.ihost.com:6443/version": EOF
        level=debug msg=Still waiting for the Kubernetes API: Get "https://api.p1313.cecc.ihost.com:6443/version": EOF
        level=debug msg=Still waiting for the Kubernetes API: Get "https://api.p1313.cecc.ihost.com:6443/version": EOF
        level=debug msg=Still waiting for the Kubernetes API: Get "https://api.p1313.cecc.ihost.com:6443/version": EOF
        level=debug msg=Still waiting for the Kubernetes API: Get "https://api.p1313.cecc.ihost.com:6443/version": EOF
        level=error msg=Attempted to gather ClusterOperator status after wait failure: listing ClusterOperator objects: Get "https://api.p1313.cecc.ihost.com:6443/apis/config.openshift.io/v1/clusteroperators": EOF
        level=info msg=Use the following commands to gather logs from the cluster
        level=info msg=openshift-install gather bootstrap --help
        level=error msg=Bootstrap failed to complete: Get "https://api.p1313.cecc.ihost.com:6443/version": EOF
        level=error msg=Failed waiting for Kubernetes API. This error usually happens when there is a problem on the bootstrap host that prevents creating a temporary control plane.
        stderr_lines:
        stdout: ''
        ~~~

      We can see from OCP node as below:

      ~~~
      [root@p1390-master ~]# journalctl -b -f -u node-image-pull.service
      Dec 18 15:33:31 p1390-master.p1390.cecc.ihost.com node-image-pull.sh[1999]: Failed to fetch release image; retrying...
      Dec 18 15:33:42 p1390-master.p1390.cecc.ihost.com node-image-pull.sh[4307]: layers already present: 45; layers needed: 8 (373.6 MB)
      Dec 18 15:33:42 p1390-master.p1390.cecc.ihost.com node-image-pull.sh[4307]: error: Importing: Unencapsulating base: Layer sha256:a1a95042c79ebdea459dd626fe0cd7b99e309c81e483be09e307a4714a08cd1e: Importing objects: Importing object 0b/c9515b64a0c9f95a12f3c1ee2fe41eb72e78e55e1261fc344ef03e9f32e80e.file: Processing content object 0bc9515b64a0c9f95a12f3c1ee2fe41eb72e78e55e1261fc344ef03e9f32e80e: Importing regfile small: Writing content object: min-free-space-percent '3%' would be exceeded, at least 65.5 kB requested
      Dec 18 15:33:42 p1390-master.p1390.cecc.ihost.com node-image-pull.sh[1999]: Failed to fetch release image; retrying...
      Dec 18 15:33:53 p1390-master.p1390.cecc.ihost.com node-image-pull.sh[4354]: layers already present: 45; layers needed: 8 (373.6 MB)
      Dec 18 15:33:53 p1390-master.p1390.cecc.ihost.com node-image-pull.sh[4354]: error: Importing: Unencapsulating base: Layer sha256:a1a95042c79ebdea459dd626fe0cd7b99e309c81e483be09e307a4714a08cd1e: Importing objects: Importing object 0b/c9515b64a0c9f95a12f3c1ee2fe41eb72e78e55e1261fc344ef03e9f32e80e.file: Processing content object 0bc9515b64a0c9f95a12f3c1ee2fe41eb72e78e55e1261fc344ef03e9f32e80e: Importing regfile small: Writing content object: min-free-space-percent '3%' would be exceeded, at least 65.5 kB requested
      ~~~

      We noticed further that the partitions created by installer are too small to handle:

      ~~~
      [root@p1390-master ~]# df -h
      Filesystem Size Used Avail Use% Mounted on
      devtmpfs 4.0M 0 4.0M 0% /dev
      tmpfs 64G 128K 64G 1% /dev/shm
      tmpfs 26G 52M 26G 1% /run
      tmpfs 64G 697M 64G 2% /run/ephemeral_base
      /dev/loop0 64G 1.1G 63G 2% /run/ephemeral
      /dev/loop1 1.1G 1.1G 0 100% /sysroot <<<<<<<<<<<<<<<<<<<<<
      tmpfs 64G 0 64G 0% /tmp
      tmpfs 4.0G 3.9G 123M 98% /var/ostree-container <<<<<<<<<<<<<<<
      tmpfs 13G 0 13G 0% /run/user/1000
      ~~~

      Then they tried resizing the FS in bootstrap:

      [root@p1382-master ~]# mount -o remount,size=8G /var/ostree-container

      right after the download image seemed to be happy

      ~~~

      Dec 19 17:47:53 p1382-master.p1382.cecc.ihost.com node-image-pull.sh[7863]: Wrote: ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ea3cee12018835ed840252172334dd28e01f43b8630831e45c05893d688815a0 => f7f7813f8b2306ff62cfba2eb3de7d95652b55cb842454827e9d90163703ddc4
      Dec 19 17:47:54 p1382-master.p1382.cecc.ihost.com node-image-pull.sh[1540]: Checking out node image content
      Dec 19 17:47:55 p1382-master.p1382.cecc.ihost.com systemd[1]: Finished Node Image Pull.
      Dec 19 17:47:55 p1382-master.p1382.cecc.ihost.com systemd[1]: node-image-pull.service: Deactivated successfully.
      Dec 19 17:47:55 p1382-master.p1382.cecc.ihost.com systemd[1]: Stopped Node Image Pull.
      Dec 19 17:47:55 p1382-master.p1382.cecc.ihost.com systemd[1]: node-image-pull.service: Consumed 1min 55.366s CPU time.
      ~~~

      and the bootstrap process seems to started.

              zabitter Zane Bitter
              rhn-support-puplench Pratik Uplenchwar
              Rama Kasturi Narra Rama Kasturi Narra
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: