Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: 4.15.0
Affects Version/s: 4.13, 4.12, 4.11, 4.14, 4.15, 4.16
Component/s: Node / CRI-O
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
No

Target Backport Versions:

4.13.z, 4.12.z, 4.14.z, 4.15.0
Target Version:

4.15.0
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
Done
Release Note Type:
Removed Functionality
Release Note Text:

Hide
* Remove the automatic image removal on an upgrade. Now, when Openshift performs a minor upgrade, the container images won't be automatically removed. This caused issues in pre-pulling images. The images will be instead subject to kubelet's image garbage collection, which will trigger based on disk usage. (link:https://issues.redhat.com/browse/OCPBUGS-25228[*~~OCPBUGS-25228~~*])

Show
* Remove the automatic image removal on an upgrade. Now, when Openshift performs a minor upgrade, the container images won't be automatically removed. This caused issues in pre-pulling images. The images will be instead subject to kubelet's image garbage collection, which will trigger based on disk usage. (link: https://issues.redhat.com/browse/OCPBUGS-25228 [* OCPBUGS-25228 *])

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

This is a clone of issue ~~OCPBUGS-24743~~. The following is the description of the original issue:
—

Description of problem:

Since many 4.y ago, before 4.11 and all the minor versions that are still supported, CRI-O has wiped images when it comes up after a node reboot and notices it has a new (minor?) version. This causes redundant pulls, as seen in this 4.11-to-4.12 update run:

$ curl -s  https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-azure-sdn-upgrade/1732741139229839360/artifacts/e2e-azure-sdn-upgrade/gather-extra/artifacts/nodes/ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4/journal | zgrep 'Starting update from rendered-\|crio-wipe\|Pulled image: registry.ci.openshift.org/ocp/4.12-2023-12-07-060628@sha256:3c3e67faf4b6e9e95bebb0462bd61c964170893cb991b5c4de47340a2f295dc2'
Dec 07 13:05:42.474144 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Succeeded.
Dec 07 13:05:42.481470 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Consumed 191ms CPU time
Dec 07 13:59:51.000686 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 crio[1498]: time="2023-12-07 13:59:51.000591203Z" level=info msg="Pulled image: registry.ci.openshift.org/ocp/4.12-2023-12-07-060628@sha256:3c3e67faf4b6e9e95bebb0462bd61c964170893cb991b5c4de47340a2f295dc2" id=a62bc972-67d7-401a-9640-884430bd16f1 name=/runtime.v1.ImageService/PullImage
Dec 07 14:00:55.745095 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 root[101294]: machine-config-daemon[99469]: Starting update from rendered-worker-ca36a33a83d49b43ed000fd422e09838 to rendered-worker-c0b3b4eadfe6cdfb595b97fa293a9204: &{osUpdate:true kargs:false fips:false passwd:false files:true units:true kernelType:false extensions:false}
Dec 07 14:05:33.274241 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Succeeded.
Dec 07 14:05:33.289605 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Consumed 216ms CPU time
Dec 07 14:14:50.277011 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 crio[1573]: time="2023-12-07 14:14:50.276961087Z" level=info msg="Pulled image: registry.ci.openshift.org/ocp/4.12-2023-12-07-060628@sha256:3c3e67faf4b6e9e95bebb0462bd61c964170893cb991b5c4de47340a2f295dc2" id=1a092fbd-7ffa-475a-b0b7-0ab115dbe173 name=/runtime.v1.ImageService/PullImage

The redundant pulls cost network and disk traffic, and avoiding them should make those update-initiated reboots quicker and cheaper. The lack of update-initiated wipes is not expected to cost much, because the Kubelet's old-image garbage collection should be along to clear out any no-longer-used images if disk space gets tight.

Version-Release number of selected component (if applicable):

At least 4.11. Possibly older 4.y; I haven't checked.

How reproducible:

Every time.

Steps to Reproduce:

1. Install a cluster.
2. Update to a release image with a different CRI-O (minor?) version.
3. Check logs on the nodes.

Actual results:

crio-wipe entries in the logs, with reports of target-release images being pulled before and after those wipes, as I quoted in the Description.

Expected results:

Target-release images pulled before the reboot, and found in the local cache if that image is needed again post-reboot.

blocks

OCPBUGS-26500 Remove CRI-O-update-triggered image wipe

Closed

clones

OCPBUGS-24743 Remove CRI-O-update-triggered image wipe

Closed

is blocked by

OCPBUGS-24743 Remove CRI-O-update-triggered image wipe

Closed

is cloned by

OCPBUGS-26500 Remove CRI-O-update-triggered image wipe

Closed

links to

openshift/machine-config-operator#4072: [release-4.15] OCPBUGS-25228: crio: drop automatic image cleanup on upgrades

RHSA-2023:7198 OpenShift Container Platform 4.15 security update

(1 links to)

Assignee:: Node Team Bot Account

Reporter:: OpenShift Prow Bot

QA Contact:: Sunil Choudhary

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2023/12/12 3:07 PM

Updated:: 2025/07/24 11:50 AM

Resolved:: 2024/02/27 9:05 PM

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates