-
Story
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
BU Product Work
-
3
-
False
-
None
-
False
-
OCPSTRAT-1585 - Cluster-version operator version-pod failure accessability
-
-
-
OTA 256, OTA 257, OTA 258, OTA 259, OTA 260, OTA 261, OTA 262, OTA 263
Currently the CVO launches a Job and waits for it to complete to get manifests for an incoming release payload. But the Job controller doesn't bubble up details about why the pod has trouble (e.g. Init:SignatureValidationFailed), so to get those details, we need direct access to the Pod. The Job controller doesn't seem like it's adding much value here, so we probably want to drop it and create and monitor the Pod ourselves.
Definition of done: failure modes like unretrievable image digests (e.g. quay.io/openshift-release-dev/ocp-release@sha256:0000000000000000000000000000000000000000000000000000000000000000) or images with missing or unacceptable Sigstore signatures with OTA-1304's ClusterImagePolicy) have failure-mode details in ClusterVersion's RetrievePayload message, instead of the current Job was active longer than specified deadline.
Not clear to me what we want to do with reason, which is currently DeadlineExceeded. Keep that? Split out some subsets like SignatureValidationFailed and whatever we get for image-pull-failures? Other?
- is related to
-
OTA-1304 Cluster-update-keys should grow a manifest for ClusterImagePolicy
- Closed
- links to