Loading...

XML

Word

Printable

Type: Spike
Resolution: Done
Priority: Critical
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- UpgradeBlocker

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
None
Story Points:
None

Target Version:
None
Sprint:
OCPNODE Sprint 232 (Green), OCPNODE Sprint 233 (Green)

We're asking the following questions to evaluate whether or not ~~OCPBUGS-7719~~ warrants changing update recommendations from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid recommending an update which introduces new risk or reduces cluster functionality in any way. In the absence of a declared update risk (the status quo), there is some risk that the existing fleet updates into the at-risk releases. Depending on the bug and estimated risk, leaving the update risk undeclared may be acceptable.

Sample answers are provided to give more context and the ImpactStatementRequested label has been added to ~~OCPBUGS-7719~~. When responding, please move this ticket to Code Review. The expectation is that the assignee answers these questions.

Which 4.y.z to 4.y'.z' updates increase vulnerability?

reasoning: This allows us to populate from and to in conditional update recommendations for "the $SOURCE_RELEASE to $TARGET_RELEASE update is exposed.
example: Customers upgrading from any 4.y to 4.10.52+, 4.11.26+, 4.12.2+, or 4.13.0-ec.3. Use oc adm upgrade to show your current cluster version.

Which types of clusters?

reasoning: This allows us to populate matchingRules in conditional update recommendations for "clusters like $THIS".
example: Clusters with leaked MachineConfig caused by having more than one KubeletConfig or ContainerRuntimeConfig targeting a single non-control-plane MachineConfigPool while running 4.12.0-ec.1 or later. Check your vulnerability with oc ... or the following PromQL count (...) > 0.

The two questions above are sufficient to declare an initial update risk, and we would like as much detail as possible on them as quickly as you can get it. Perfectly crisp responses are nice, but are not required. For example "it seems like these platforms are involved, because..." in a day 1 draft impact statement is helpful, even if you follow up with "actually, it was these other platforms" on day 3. In the absence of a response within 7 days, we may or may not declare a conditional update risk based on our current understanding of the issue.

If you can, answers to the following questions will make the conditional risk declaration more actionable for customers.

What is the impact? Is it serious enough to warrant removing update recommendations?

reasoning: This allows us to populate name and message in conditional update recommendations for "...because if you update, $THESE_CONDITIONS may cause $THESE_UNFORTUNATE_SYMPTOMS".
example: MachineConfigPool rollouts, including the one rolling into the incoming release, will wedge. Depending on timing and pool configuration, this might block the OCP update from completing.

How involved is remediation?

reasoning: This allows administrators who are already vulnerable, or who chose to waive conditional-update risks, to recover their cluster. And even moderately serious impacts might be acceptable if they are easy to mitigate.
example: Delete the leaked MachineConfig which are mentioned in the MachineConfigPool error condition.

Is this a regression?

reasoning: Updating between two vulnerable releases may not increase exposure (unless rebooting during the update increases vulnerability, etc.). We only qualify update recommendations if the update increases exposure.
example: Yes, 4.10.52+, 4.11.26+, 4.12.2+, and 4.13.0-ec.3 landed code that pivoted the machine-config components treatment of leaked MachineConfig from "ignore" to "block on".

is related to

OCPBUGS-8260 Update to 4.13.0-ec.3 stuck on leaked MachineConfig

Closed

OCPBUGS-8261 Update to 4.13.0-ec.3 stuck on leaked MachineConfig

Closed

relates to

OCPBUGS-7719 Update to 4.13.0-ec.3 stuck on leaked MachineConfig

Closed

links to

kcs#7000388

openshift/cincinnati-graph-data#3243: OCPNODE-1502: Declare `LeakedMachineConfigBlocksMCO` on 4.11.26+ | 4.12.2+ | 4.13.0-ec.3+

Assignee:: Qi Wang

Reporter:: W. Trevor King

Need Info From:: None

Contributors:: None

QA Contact:: None

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: 2023/02/17 5:32 PM

Updated:: 2025/09/13 9:05 AM

Resolved:: 2023/03/10 2:25 AM

Details

Description

Which 4.y.z to 4.y'.z' updates increase vulnerability?

Which types of clusters?

What is the impact? Is it serious enough to warrant removing update recommendations?

How involved is remediation?

Is this a regression?

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates