Uploaded image for project: 'Container Tools'
  1. Container Tools
  2. RUN-2116

CRI-O RFE: When validating signatures, check incomplete results for sufficiency

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • None
    • None
    • False
    • None
    • False
    • sst_container_tools

      Summary of the issue

      There have been a few incidents (RCMAUTO-6093, TRT-1625, likely others) where caching for:

      https://registry.redhat.io/containers/sigstore/.../signature-2:
      

      has lead to errors like those seen in this CI run:

      $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.16-e2e-gcp-ovn-upgrade/1783654894146686976/artifacts/e2e-gcp-ovn-upgrade/gather-extra/artifacts/events.json | jq -r '.items[] | select(.metadata.namespace == "openshift-marketplace" and (tostring | contains("signature"))) | (.source | tostring) + " " + .message'
      {"component":"kubelet","host":"ci-op-cdfql34b-f9945-g4r4x-master-2"} Failed to pull image "registry.redhat.io/redhat/certified-operator-index:v4.16": copying system image from manifest list: reading signatures: parsing signature https://registry.redhat.io/containers/sigstore/redhat/certified-operator-index@sha256=a81e6d248af48ea2071c976decc752935740b769598179756d07b8b15a4f89cc/signature-2: unrecognized signature format, starting with binary 0x3c
      ...
      

      My understanding is that when the signature-retrieval function finds an invalid signature like that signature-2 (text/html, when it should have been a 404 or GPG signature), the "hey, that's not a GPG signature!" error gets bubbled up to the kubelet and on into a Kubernetes Event, without CRI-O checking to see if the successfully retrieved signature-1 was sufficient to validate the image.

      This ticket is suggesting that when signature-retrieval is partially successful, CRI-O process the partial results to see if they are sufficient, and launch the cluster if they are (while continuing to fail if they are not).

      Error handling gets a bit fiddly. When the partial results are sufficient to validate the image, is it worth logging and/or bubbling up "...but we did have some unexpected difficulty with signature-retrieval" warnings? And when the partial results are insufficient, do we want to return both "signature-1 wasn't trusted" and "I had trouble retrieving signature-2" in the error passed up to the kubelet? But however you go on that side of things, the partial-signature-processing pivot should increase the cluster's resiliency in the face of this kind of signature-server caching/access trouble.

            Unassigned Unassigned
            trking W. Trevor King
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: