Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: 4.16.0
Affects Version/s: 4.13, 4.12, 4.11, 4.14, 4.15, 4.16
Component/s: Node / CRI-O
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
No

Target Backport Versions:

4.13.z, 4.12.z, 4.14.z, 4.15.0
Target Version:

4.16.0
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
Removed Functionality
Release Note Text:

Hide
Previously, the CRI-O removed all of the images installed between minor version upgrades of {product-name} to ensure stale payload images didn't take up space on the node. However, it was decided this was a performance penalty, and this functionality was removed. With this fix, the kubelet will still garbage collect stale images after disk usage hits a certain level. As a result, {product-title} no longer removes all images after an upgrade between minor versions. (link:https://issues.redhat.com/browse/OCPBUGS-24743[*~~OCPBUGS-24743~~*])

Show
Previously, the CRI-O removed all of the images installed between minor version upgrades of {product-name} to ensure stale payload images didn't take up space on the node. However, it was decided this was a performance penalty, and this functionality was removed. With this fix, the kubelet will still garbage collect stale images after disk usage hits a certain level. As a result, {product-title} no longer removes all images after an upgrade between minor versions. (link: https://issues.redhat.com/browse/OCPBUGS-24743 [* OCPBUGS-24743 *])

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Since many 4.y ago, before 4.11 and all the minor versions that are still supported, CRI-O has wiped images when it comes up after a node reboot and notices it has a new (minor?) version. This causes redundant pulls, as seen in this 4.11-to-4.12 update run:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-azure-sdn-upgrade/1732741139229839360/artifacts/e2e-azure-sdn-upgrade/gather-extra/artifacts/nodes/ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4/journal | zgrep 'Starting update from rendered-\|crio-wipe\|Pulled image: registry.ci.openshift.org/ocp/4.12-2023-12-07-060628@sha256:3c3e67faf4b6e9e95bebb0462bd61c964170893cb991b5c4de47340a2f295dc2'
Dec 07 13:05:42.474144 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Succeeded.
Dec 07 13:05:42.481470 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Consumed 191ms CPU time
Dec 07 13:59:51.000686 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 crio[1498]: time="2023-12-07 13:59:51.000591203Z" level=info msg="Pulled image: registry.ci.openshift.org/ocp/4.12-2023-12-07-060628@sha256:3c3e67faf4b6e9e95bebb0462bd61c964170893cb991b5c4de47340a2f295dc2" id=a62bc972-67d7-401a-9640-884430bd16f1 name=/runtime.v1.ImageService/PullImage
Dec 07 14:00:55.745095 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 root[101294]: machine-config-daemon[99469]: Starting update from rendered-worker-ca36a33a83d49b43ed000fd422e09838 to rendered-worker-c0b3b4eadfe6cdfb595b97fa293a9204: &{osUpdate:true kargs:false fips:false passwd:false files:true units:true kernelType:false extensions:false}
Dec 07 14:05:33.274241 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Succeeded.
Dec 07 14:05:33.289605 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Consumed 216ms CPU time
Dec 07 14:14:50.277011 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 crio[1573]: time="2023-12-07 14:14:50.276961087Z" level=info msg="Pulled image: registry.ci.openshift.org/ocp/4.12-2023-12-07-060628@sha256:3c3e67faf4b6e9e95bebb0462bd61c964170893cb991b5c4de47340a2f295dc2" id=1a092fbd-7ffa-475a-b0b7-0ab115dbe173 name=/runtime.v1.ImageService/PullImage

The redundant pulls cost network and disk traffic, and avoiding them should make those update-initiated reboots quicker and cheaper. The lack of update-initiated wipes is not expected to cost much, because the Kubelet's old-image garbage collection should be along to clear out any no-longer-used images if disk space gets tight.

Version-Release number of selected component (if applicable):

At least 4.11. Possibly older 4.y; I haven't checked.

How reproducible:

Every time.

Steps to Reproduce:

1. Install a cluster.
2. Update to a release image with a different CRI-O (minor?) version.
3. Check logs on the nodes.

Actual results:

crio-wipe entries in the logs, with reports of target-release images being pulled before and after those wipes, as I quoted in the Description.

Expected results:

Target-release images pulled before the reboot, and found in the local cache if that image is needed again post-reboot.

blocks

OCPBUGS-25228 Remove CRI-O-update-triggered image wipe

Closed

is cloned by

OCPBUGS-25228 Remove CRI-O-update-triggered image wipe

Closed

relates to

OCPNODE-1032 Investigate reprocussions of dropping crio-wipe image cleanup

Closed

links to

openshift/machine-config-operator#4068: OCPBUGS-24743: crio: drop automatic image cleanup on upgrades

RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update

Assignee:: Peter Hunt

Reporter:: W. Trevor King

QA Contact:: Sunil Choudhary

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2023/12/08 10:16 PM

Updated:: 2025/07/24 5:42 PM

Resolved:: 2024/06/27 11:25 AM

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates