Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-56269

Cincinnati should remove registry-cache placeholders on release scrape failures

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 3
    • Moderate
    • None
    • None
    • None
    • In Progress
    • Release Note Not Required
    • None
    • None
    • None
    • None
    • None

      Description of problem

      Since cincinnati#381, cincinnati/src/plugins/internal/graph_builder/release_scrape_dockerv2/registry/mod.rs injects a cache placeholder to prevent other, parallel scrape-workers from overlapping on a single release image. After a successful scrape, the cache entry gets updated with real data. However, on failure we are not currently clearing the placeholder from the cache. This leaves the graph-builder unable to recover from a failed scrape for a particular release until the container is restarted and the cache-building starts over.

      Version-Release number of selected component

      https://github.com/openshift/cincinnati/commit/ec72ca150f8537a3cfe292b8299570d86f4ffd15

      How reproducible

      Unclear.

      Steps to Reproduce

      Unclear. Possibly related to Quay 502s while scraping release blobs.

      Actual results

      Occasional Cincinanti shards claim 0.0.0 entries in graph-builder results:

      $ oc exec -n cincinnat-production -c cincinnati-graph-builder pod/cincinnati-6d856c4fcc-7b6zs -- curl -s 'http://localhost:8080/api/upgrades_info/graph >graph-builder-bad.json
      $ jq -c '.nodes[] | select(.version == "0.0.0")' graph-builder-bad.json
      {"version":"0.0.0","payload":"quay.io/openshift-release-dev/ocp-release@sha256:d9796f410879103cd17066d31bfedd02546d2e6ff78b9d6b5c77ba2f56950193","metadata":{}}
      

      Expected results

      Cincinnati both:

      • Avoids parallel scrape attempts on a single image. We don't want to regress vs. cincinnati#381.
      • Recovers in subsequent scrape rounds, after a series of blob-scraping failures cause us to give up on scraping a release into the current scrape round.

              Unassigned Unassigned
              trking W. Trevor King
              None
              None
              Jia Liu Jia Liu
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: