Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-70168

node-image-pull.service is failing to fetch release image, resulting installation failure

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Customer is trying to deploy OCP 4.20.8[specifically] on Power using UPI and network boot since a long time now. Until recently 4.20 was working fine, but apparently since 4.20.8 the deployment fails all the time.
      We can see 4.20.6 was still fine. We do not see a 4.20.7 version that was deployed.
      Cu tried almost 8 times.

      ~~~
      cmd:

      • /root/install/openshift-install
      • wait-for
      • bootstrap-complete
      • --dir
      • /root/install
      • --log-level
      • debug
        delta: '0:20:11.243963'
        end: '2025-12-16 18:38:35.347608'
        msg: non-zero return code
        rc: 5
        start: '2025-12-16 18:18:24.103645'
        stderr: |-
        level=debug msg=OpenShift Installer 4.20.8
        level=debug msg=Built from commit cc82f30cd640577297f66b5df80f0e08c55fd3fa
        level=info msg=Waiting up to 20m0s (until 6:38PM EST) for the Kubernetes API at https://api.p1313.cecc.ihost.com:6443...
        level=debug msg=Loading Agent Config...
        level=debug msg=Still waiting for the Kubernetes API: Get "https://api.p1313.cecc.ihost.com:6443/version": EOF
        level=debug msg=Still waiting for the Kubernetes API: Get "https://api.p1313.cecc.ihost.com:6443/version": EOF
        level=debug msg=Still waiting for the Kubernetes API: Get "https://api.p1313.cecc.ihost.com:6443/version": EOF
        level=debug msg=Still waiting for the Kubernetes API: Get "https://api.p1313.cecc.ihost.com:6443/version": EOF
        level=debug msg=Still waiting for the Kubernetes API: Get "https://api.p1313.cecc.ihost.com:6443/version": EOF
        level=debug msg=Still waiting for the Kubernetes API: Get "https://api.p1313.cecc.ihost.com:6443/version": EOF
        level=debug msg=Still waiting for the Kubernetes API: Get "https://api.p1313.cecc.ihost.com:6443/version": EOF
        level=debug msg=Still waiting for the Kubernetes API: Get "https://api.p1313.cecc.ihost.com:6443/version": EOF
        level=error msg=Attempted to gather ClusterOperator status after wait failure: listing ClusterOperator objects: Get "https://api.p1313.cecc.ihost.com:6443/apis/config.openshift.io/v1/clusteroperators": EOF
        level=info msg=Use the following commands to gather logs from the cluster
        level=info msg=openshift-install gather bootstrap --help
        level=error msg=Bootstrap failed to complete: Get "https://api.p1313.cecc.ihost.com:6443/version": EOF
        level=error msg=Failed waiting for Kubernetes API. This error usually happens when there is a problem on the bootstrap host that prevents creating a temporary control plane.
        stderr_lines:
        stdout: ''
        ~~~

      We can see from OCP node as below:

      ~~~
      [root@p1390-master ~]# journalctl -b -f -u node-image-pull.service
      Dec 18 15:33:31 p1390-master.p1390.cecc.ihost.com node-image-pull.sh[1999]: Failed to fetch release image; retrying...
      Dec 18 15:33:42 p1390-master.p1390.cecc.ihost.com node-image-pull.sh[4307]: layers already present: 45; layers needed: 8 (373.6 MB)
      Dec 18 15:33:42 p1390-master.p1390.cecc.ihost.com node-image-pull.sh[4307]: error: Importing: Unencapsulating base: Layer sha256:a1a95042c79ebdea459dd626fe0cd7b99e309c81e483be09e307a4714a08cd1e: Importing objects: Importing object 0b/c9515b64a0c9f95a12f3c1ee2fe41eb72e78e55e1261fc344ef03e9f32e80e.file: Processing content object 0bc9515b64a0c9f95a12f3c1ee2fe41eb72e78e55e1261fc344ef03e9f32e80e: Importing regfile small: Writing content object: min-free-space-percent '3%' would be exceeded, at least 65.5 kB requested
      Dec 18 15:33:42 p1390-master.p1390.cecc.ihost.com node-image-pull.sh[1999]: Failed to fetch release image; retrying...
      Dec 18 15:33:53 p1390-master.p1390.cecc.ihost.com node-image-pull.sh[4354]: layers already present: 45; layers needed: 8 (373.6 MB)
      Dec 18 15:33:53 p1390-master.p1390.cecc.ihost.com node-image-pull.sh[4354]: error: Importing: Unencapsulating base: Layer sha256:a1a95042c79ebdea459dd626fe0cd7b99e309c81e483be09e307a4714a08cd1e: Importing objects: Importing object 0b/c9515b64a0c9f95a12f3c1ee2fe41eb72e78e55e1261fc344ef03e9f32e80e.file: Processing content object 0bc9515b64a0c9f95a12f3c1ee2fe41eb72e78e55e1261fc344ef03e9f32e80e: Importing regfile small: Writing content object: min-free-space-percent '3%' would be exceeded, at least 65.5 kB requested
      ~~~

      We noticed further that the partitions created by installer are too small to handle:

      ~~~
      [root@p1390-master ~]# df -h
      Filesystem Size Used Avail Use% Mounted on
      devtmpfs 4.0M 0 4.0M 0% /dev
      tmpfs 64G 128K 64G 1% /dev/shm
      tmpfs 26G 52M 26G 1% /run
      tmpfs 64G 697M 64G 2% /run/ephemeral_base
      /dev/loop0 64G 1.1G 63G 2% /run/ephemeral
      /dev/loop1 1.1G 1.1G 0 100% /sysroot <<<<<<<<<<<<<<<<<<<<<
      tmpfs 64G 0 64G 0% /tmp
      tmpfs 4.0G 3.9G 123M 98% /var/ostree-container <<<<<<<<<<<<<<<
      tmpfs 13G 0 13G 0% /run/user/1000
      ~~~

      Then they tried resizing the FS in bootstrap:

      [root@p1382-master ~]# mount -o remount,size=8G /var/ostree-container

      right after the download image seemed to be happy

      ~~~

      Dec 19 17:47:53 p1382-master.p1382.cecc.ihost.com node-image-pull.sh[7863]: Wrote: ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ea3cee12018835ed840252172334dd28e01f43b8630831e45c05893d688815a0 => f7f7813f8b2306ff62cfba2eb3de7d95652b55cb842454827e9d90163703ddc4
      Dec 19 17:47:54 p1382-master.p1382.cecc.ihost.com node-image-pull.sh[1540]: Checking out node image content
      Dec 19 17:47:55 p1382-master.p1382.cecc.ihost.com systemd[1]: Finished Node Image Pull.
      Dec 19 17:47:55 p1382-master.p1382.cecc.ihost.com systemd[1]: node-image-pull.service: Deactivated successfully.
      Dec 19 17:47:55 p1382-master.p1382.cecc.ihost.com systemd[1]: Stopped Node Image Pull.
      Dec 19 17:47:55 p1382-master.p1382.cecc.ihost.com systemd[1]: node-image-pull.service: Consumed 1min 55.366s CPU time.
      ~~~

      and the bootstrap process seems to started.

              rhn-support-pamoedom Pedro Jose Amoedo Martinez
              rhn-support-puplench Pratik Uplenchwar
              None
              None
              Gaoyun Pei Gaoyun Pei
              None
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: