Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: 4.14.0
Affects Version/s: 4.13, 4.12, 4.11, 4.10
Component/s: Cluster Version Operator
Labels:
None

Regression:
None
Story Points:
3
Sprint:
OTA 230, OTA 231, OTA 232, OTA 233
sprint_count:
4
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
Previously, the Cluster Version Operator did not prioritize likely targets when deciding which conditional update risks to evaluate first.

With this change, it begins doing so, and conditional updates where the risks happen to not apply will become available more quickly after coming to the Operator's attention (for example, after a channel change).(link:https://issues.redhat.com/browse/OCPBUGS-5469[*~~OCPBUGS-5469~~*])

Show
Previously, the Cluster Version Operator did not prioritize likely targets when deciding which conditional update risks to evaluate first. With this change, it begins doing so, and conditional updates where the risks happen to not apply will become available more quickly after coming to the Operator's attention (for example, after a channel change).(link: https://issues.redhat.com/browse/OCPBUGS-5469 [* OCPBUGS-5469 *])
Release Note Type:
Bug Fix
Release Note Status:
Done
Target Version:

4.14.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

When changing channels it's possible that multiple new conditional update risks will need to be evaluated. For instance, a cluster running 4.10.34 in a 4.10 channel today only has to evaluate `OpenStackNodeCreationFails` but when the channel is changed to a 4.11 channel multiple new risks require evaluation and the evaluation of new risks is throttled at one every 10 minutes. This means if there are three new risks it may take up to 30 minutes after the channel has changed for the full set of conditional updates to be computed. This leads to a perception that no update paths are recommended because most will not wait 30 minutes, they expect immediate feedback.

Version-Release number of selected component (if applicable):

4.10.z, 4.11.z, 4.12, 4.13

How reproducible:

100%

Steps to Reproduce:

1. Install 4.10.34
2. Switch from stable-4.10 to stable-4.11
3.

Actual results:

Observe no recommended updates for 10-20 minutes because all available paths to 4.11 have a risk associated with them

Expected results:

Risks are computed in a timely manner for an interactive UX, lets say < 10s

Additional info:

This was intentional in the design, we didn't want risks to continuously re-evaluate or overwhelm the monitoring stack, however we didn't anticipate that we'd have long standing pile of risks and realize how confusing the user experience would be.

We intend to work around this in the deployed fleet by converting older risks from `type: promql` to `type: Always` avoiding the evaluation period but preserving the notification. While this may lead customers to believe they're exposed to a risk they may not be, as long as the set of outstanding risks to the latest version is limited to no more than one it's likely no one will notice. All 4.10 and 4.11 clusters currently have a clear path toward relatively recent 4.10.z or 4.11.z with no more than one risk to be evaluated.

blocks

OCPBUGS-10221 Risk cache warming takes too long on channel changes

Closed

is cloned by

OCPBUGS-10221 Risk cache warming takes too long on channel changes

Closed

is related to

OCPBUGS-19512 Faster risk cache warming

Closed

OCPBUGS-13308 Conditional update "unknown due to an evaluation failure: client-side throttling" message is not clear

Closed

links to

openshift/cluster-version-operator#909: OCPBUGS-5469: pkg/cvo/availableupdates: Prioritize conditional risks for largest target version

RHEA-2023:5006 rpm

(1 links to)

Assignee:: W. Trevor King

Reporter:: Scott Dodson

QA Contact:: Yang Yang

Votes:: 1 Vote for this issue

Watchers:: 12 Start watching this issue

Created:: 2023/01/06 4:52 PM

Updated:: 2023/10/31 1:31 PM

Resolved:: 2023/10/31 12:56 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates