Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.13, 4.12, 4.14, 4.15, 4.16, 4.17, 4.18
Component/s: Cluster Version Operator
Labels:
- NeedTestCases

Severity:
Moderate
Regression:
None
Epic Link:
Production/public instance of OSUS should be able to scale without causing issues in a multi-tenant environment- phase2
Story Points:
3
Sprint:
OTA 261, OTA 262
sprint_count:
2
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Type:
Release Note Not Required
Release Note Status:
In Progress
Target Version:

4.18.0
Target Backport Versions:

4.18

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem

Update advice is append-only, with 4.y.z releases being added to channels regularly, and new update risks being declared occasionally. This makes caching a very safe behavior, and client-side caching in the CVO would reduce the disruption caused by OpenShift Update Service (OSUS) outages like ~~OTA-1376~~.

Version-Release number of selected component

A single failed update-service retrieval currently clears the cache in 4.18. The code is pretty old, so I expect this behavior goes back through 4.12, our oldest release that's not yet end-of-life.

How reproducible

Every time.

Steps to Reproduce

1. Run a happy cluster with update advice.
2. Break the update service, e.g. by using ~~OTA-520~~ for a mock update service.
3. Wait a few minutes for the cluster to notice the breakage.
4. Check it's update recommendations, with oc adm upgrade or the new-in-4.18 oc adm upgrade recommend.

Actual results

No recommendations while the cluster is RetrievedUpdates=False.

Expected results

Preserving the cached recommendations while the cluster is RetrievedUpdates=False, at least for 24 hours. I'm not committed to a particular time, but 24h is much larger than any OSUS outage we've ever had, and still not so long that we'd expect much in the way of recommendation changes if the service had remained healthy.

is depended on by

OTA-1376 2024-10-22 Cincinnati / Update Service outage

Closed

links to

openshift/cluster-version-operator#1098: OCPBUGS-44018: pkg/cvo/availableupdates: Preserve update advice on update-service failures

RHEA-2024:6122 OpenShift Container Platform 4.18.z bug fix update

Assignee:: W. Trevor King

Reporter:: W. Trevor King

QA Contact:: Dinesh Kumar S

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2024/10/30 3:06 PM

Updated:: 2025/02/25 4:49 AM

Resolved:: 2025/02/25 4:49 AM

Details

Description

Description of problem

Version-Release number of selected component

How reproducible

Steps to Reproduce

Actual results

Expected results

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates