[OTA-1321] ClusterVersion status should include version-Pod error details - Red Hat Issue Tracker

Type: Epic
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- no-docs

Epic Name:
ClusterVersion status should include version-Pod error details
Work Type:
BU Product Work
Blocked:
False
Blocked Reason:
None
Ready:
False
Color Status:
Not Selected
Epic Status:
To Do
Feature Link:
OCPSTRAT-1585 - Cluster-version operator version-pod failure accessability
Parent Link:
OCPSTRAT-1585Cluster-version operator version-pod failure accessability
Hierarchy Progress Bar:

0% To Do, 25% In Progress, 75% Done
Target Version:

openshift-4.19

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Intelligence Requested:
Market:

Epic Goal

Currently the CVO launches a Job and waits for it to complete to get manifests for an incoming release payload. But the Job controller doesn't bubble up details about why the pod has trouble (e.g. Init:SignatureValidationFailed), so to get those details, we need direct access to the Pod. The Job controller doesn't seem like it's adding much value here, so the goal of this Epic is to drop it and create and monitor the Pod ourselves, so we can deliver better reporting of version-Pod state.

Why is this important?

When the version Pod fails to run, the cluster admin will likely need to take some action (clearing the update request, fixing a mirror registry, etc.). The more clearly we share the issues that the Pod is having with the cluster admin, the easier it will be for them to figure out their next steps.

Scenarios

oc adm upgrade and other ClusterVersion status UIs will be able to display Init:SignatureValidationFailed and other version-Pod failure modes directly. We don't expect to be able to give ClusterVersion consumers more detailed next-step advice, but hopefully the easier access to failure-mode context makes it easier for them to figure out next-steps on their own.

Dependencies

This change is purely and updates-team/OTA CVO pull request. No other dependencies.

Contributing Teams

Development - OTA
Documentation - OTA
QE - OTA

Acceptance Criteria

Definition of done: failure modes like unretrievable image digests (e.g. quay.io/openshift-release-dev/ocp-release@sha256:0000000000000000000000000000000000000000000000000000000000000000) or images with missing or unacceptable Sigstore signatures with ~~OTA-1304~~'s ClusterImagePolicy) have failure-mode details in ClusterVersion's RetrievePayload message, instead of the current Job was active longer than specified deadline.

Drawbacks or Risk

Limited audience, and failures like Init:SignatureValidationFailed are generic, while CVO version-Pod handling is pretty narrow. This may be redundant work if we end up getting nice generic init-Pod-issue handling like RFE-5627. But even if the work ends up being redundant, thinning the CVO stack by removing the Job controller is kind of nice.

Done - Checklist

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

CI Testing - Tests are merged and completing successfully
Documentation - Content development is complete.
QE - Test scenarios are written and executed successfully.
Technical Enablement - Slides are complete (if requested by PLM)
Other

relates to

OTA-1170 [TechPreview] Support verifying release images with Sigstore signatures

Closed

links to

openshift/cluster-version-operator#1105: OTA-1307: pkg/cvo/updatepayload: Drop the Job controller for release-manifests downloads

openshift/openshift-tests-private#22694: OTA-1328: update case OCP-21771

openshift/openshift-tests-private#22698: WIP: OTA-1328 invalid sigstore sign

Assignee:: W. Trevor King

Reporter:: W. Trevor King

QA Contact:: Dinesh Kumar S

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/08/08 4:57 PM

Updated:: 2025/02/18 4:55 AM

Details

Description

Epic Goal

Why is this important?

Scenarios

Dependencies

Contributing Teams

Acceptance Criteria

Drawbacks or Risk

Done - Checklist

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide