Uploaded image for project: 'OpenShift Over the Air'
  1. OpenShift Over the Air
  2. OTA-1307

ClusterVersion status should include version-Pod error details

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • None
    • OTA 256, OTA 257, OTA 258, OTA 259, OTA 260, OTA 261, OTA 262

      Currently the CVO launches a Job and waits for it to complete to get manifests for an incoming release payload.  But the Job controller doesn't bubble up details about why the pod has trouble (e.g. Init:SignatureValidationFailed), so to get those details, we need direct access to the Pod.  The Job controller doesn't seem like it's adding much value here, so we probably want to drop it and create and monitor the Pod ourselves.

      Definition of done: failure modes like unretrievable image digests (e.g. quay.io/openshift-release-dev/ocp-release@sha256:0000000000000000000000000000000000000000000000000000000000000000) or images with missing or unacceptable Sigstore signatures with OTA-1304's ClusterImagePolicy) have failure-mode details in ClusterVersion's RetrievePayload message, instead of the current Job was active longer than specified deadline.

      Not clear to me what we want to do with reason, which is currently DeadlineExceeded. Keep that? Split out some subsets like SignatureValidationFailed and whatever we get for image-pull-failures? Other?

            trking W. Trevor King
            trking W. Trevor King
            Dinesh Kumar S Dinesh Kumar S
            Dinesh Kumar S Dinesh Kumar S
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: