-
Story
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
None
-
False
-
None
-
False
-
rhel-sst-container-tools
-
-
Summary of the issue
There have been a few incidents (RCMAUTO-6093, TRT-1625, likely others) where caching for:
https://registry.redhat.io/containers/sigstore/.../signature-2:
has lead to errors like those seen in this CI run:
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.16-e2e-gcp-ovn-upgrade/1783654894146686976/artifacts/e2e-gcp-ovn-upgrade/gather-extra/artifacts/events.json | jq -r '.items[] | select(.metadata.namespace == "openshift-marketplace" and (tostring | contains("signature"))) | (.source | tostring) + " " + .message' {"component":"kubelet","host":"ci-op-cdfql34b-f9945-g4r4x-master-2"} Failed to pull image "registry.redhat.io/redhat/certified-operator-index:v4.16": copying system image from manifest list: reading signatures: parsing signature https://registry.redhat.io/containers/sigstore/redhat/certified-operator-index@sha256=a81e6d248af48ea2071c976decc752935740b769598179756d07b8b15a4f89cc/signature-2: unrecognized signature format, starting with binary 0x3c ...
My understanding is that when the signature-retrieval function finds an invalid signature like that signature-2 (text/html, when it should have been a 404 or GPG signature), the "hey, that's not a GPG signature!" error gets bubbled up to the kubelet and on into a Kubernetes Event, without CRI-O checking to see if the successfully retrieved signature-1 was sufficient to validate the image.
This ticket is suggesting that when signature-retrieval is partially successful, CRI-O process the partial results to see if they are sufficient, and launch the cluster if they are (while continuing to fail if they are not).
Error handling gets a bit fiddly. When the partial results are sufficient to validate the image, is it worth logging and/or bubbling up "...but we did have some unexpected difficulty with signature-retrieval" warnings? And when the partial results are insufficient, do we want to return both "signature-1 wasn't trusted" and "I had trouble retrieving signature-2" in the error passed up to the kubelet? But however you go on that side of things, the partial-signature-processing pivot should increase the cluster's resiliency in the face of this kind of signature-server caching/access trouble.
- relates to
-
TRT-1625 KubePodNotReady alert in openshift-marketplace is killing payloads
- Closed